The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Kotenseki is a collection of classical and ancient Japanese literature. It is comprised of image books that express Japanese stories by using comic drawings of different characters, such as humans, nature, and animals. To effectively store them for posterity, a search system is important. We propose an efficient CBIR system to assist the users in easily accessing the information and have an enjoyable...
Affordance learning in general, is to identify the purpose, use, and ways to interact with an object, based on information gained from observing the object. Most of the existing affordance learning approaches assume the object target has been cropped individually from images. However, the object could not be easily separated from others due to occlusion or noise. Actually, two or more neighboring...
In this paper, we investigate a weakly-supervised object detection framework. Most existing frameworks focus on using static images to learn object detectors. However, these detectors often fail to generalize to videos because of the existing domain shift. Therefore, we investigate learning these detectors directly from boring videos of daily activities. Instead of using bounding boxes, we explore...
This paper proposes an innovative method for reducing the visually induced motion sickness (MS) occurred in a 3D immersive virtual environment (VE) by utilizing a flexible dynamic scene smoothing approach based on saliency analysis. A saliency model based on fully convolutional network (FCN) is first trained to establish the saliency map, then the probability maps representing the salient information...
In this paper, we propose a system to measure the crowdedness of a specific area using CCTV images. In the proposed system, objects are distinguished by using the Visual Background Extractor (ViBe) algorithm. The background that has not been removed by the ViBe is excluded from the object area using the opening technique. Then, the object area assigned to different weights based on the pixel positions...
Object detection, scene graph generation and region captioning, which are three scene understanding tasks at different semantic levels, are tied together: scene graphs are generated on top of objects detected in an image with their pairwise relationship predicted, while region captioning gives a language description of the objects, their attributes, relations and other context information. In this...
Learning visual representations with self-supervised learning has become popular in computer vision. The idea is to design auxiliary tasks where labels are free to obtain. Most of these tasks end up providing data to learn specific kinds of invariance useful for recognition. In this paper, we propose to exploit different self-supervised approaches to learn representations invariant to (i) inter-instance...
Salient object detection aims to correctly highlight the most salient object(s) in an image. Combining fine-grained contrast prior with rough-grained object consistency, this paper proposes a Focusness Guided Salient object detection (FGS) algorithm. To obtain clean and precise contrast map, FGS uses the focusness prior to guide the contrast map. Combing different saliency priors, FGS utilizes a unified...
In this paper, a temporally iterative Gaussian Mixture Model (GMM) of Dynamic Texture (DT) for target detection using a moving PTZ camera, is proposed. Camera movement in a PTZ sensor causes motion-based target detection techniques to fail for the periods affected by the scene change. This is because the whole scene is considered a representation of the target motion. When the camera is in motion,...
Given an image containing an object of interest, the object can be detected by comparing the feature points in the given image with those in a reference object image. In a case where the given image contains a large number of similar objects, the object of interest is difficult to be detected. It is because the feature points in the given image are concentrated in some regions where each region has...
This paper focuses on temporal localization of actions in untrimmed videos. Existing methods typically train classifiers for a pre-defined list of actions and apply them in a sliding window fashion. However, activities in the wild consist of a wide combination of actors, actions and objects; it is difficult to design a proper activity list that meets users’ needs. We propose to localize activities...
We aim to tackle a novel vision task called Weakly Supervised Visual Relation Detection (WSVRD) to detect “subject-predicate-object” relations in an image with object relation groundtruths available only at the image level. This is motivated by the fact that it is extremely expensive to label the combinatorial relations between objects at the instance level. Compared to the extensively studied problem,...
Detecting logo frequency and duration in sports videos provides sponsors an effective way to evaluate their advertising efforts. However, general-purposed object detection methods cannot address all the challenges in sports videos. In this paper, we propose a mutual-enhanced approach that can improve the detection of a logo through the information obtained from other simultaneously occurred logos...
A major impediment in rapidly deploying object detection models for instance detection is the lack of large annotated datasets. For example, finding a large labeled dataset containing instances in a particular kitchen is unlikely. Each new environment with new instances requires expensive data collection and annotation. In this paper, we propose a simple approach to generate large annotated instance...
In order to realize autonomous landing of the unmanned aerial vehicle (UAV) in power patrolling, a visual method vision based on Faster Regions with Convolutional Neural Network (Faster R-CNN) for UAVs is studied. In this paper, we design the landing sign of the combination of concentric circles and pentagon, and propose the Faster R-CNN recognition algorithm which can be used to identify the target...
The region-based Convolutional Neural Network (CNN) detectors such as Faster R-CNN or R-FCN have already shown promising results for object detection by combining the region proposal subnetwork and the classification subnetwork together. Although R-FCN has achieved higher detection speed while keeping the detection performance, the global structure information is ignored by the position-sensitive...
A first-person camera, placed at a person's head, captures, which objects are important to the camera wearer. Most prior methods for this task learn to detect such important objects from the manually labeled first-person data in a supervised fashion. However, important objects are strongly related to the camera wearer's internal state such as his intentions and attention, and thus, only the person...
Fully convolutional neural networks (FCNs) have shown outstanding performance in many dense labeling problems. One key pillar of these successes is mining relevant information from features in convolutional layers. However, how to better aggregate multi-level convolutional feature maps for salient object detection is underexplored. In this work, we present Amulet, a generic aggregating multi-level...
In this paper, we propose a multi-modal search engine for interior design that combines visual and textual queries. The goal of our engine is to retrieve interior objects, e.g. furniture or wall clocks, that share visual and aesthetic similarities with the query. Our search engine allows the user to take a photo of a room and retrieve with a high recall a list of items identical or visually similar...
This paper presents the evaluation of visual features for the proposed two eye detection method applied to thermal images. The use of two eye region is due to its distinctive pattern and to overcome the issue of blurred and noisy characteristic in the thermal image. Comparative performance analysis on three different features which includes Haar, Histogram of Oriented Gradients (HoG) and Local Binary...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.