The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a two-pass clustering technique for orientation-invariant text line clustering in a language-independent text localization problem based on the connected component analysis (CCA) approach. Instead of doing a single-pass cluster in the conventional way, the proposed technique firstly explores nearby objects around the candidate components. By setting up the global constraints with...
Understanding where people look in images is an important problem in computer vision. Despite significant research, it remains unclear to what extent human fixations can be predicted by low-level (contrast) compared to highlevel (presence of objects) image features. Here we address this problem by introducing two novel models that use different feature spaces but the same readout architecture. The...
Person re-identification is an important task in video surveillance systems. It can be formally defined as establishing the correspondence between images of a person taken from different cameras at different times. In this paper, we present a two stream convolutional neural network where each stream is a Siamese network. This architecture can learn spatial and temporal information separately. We also...
We report on the results of the first visual search and rating study (N60) evaluating human gaze when assessing the realism of image composites. The effects of object identity knowledge and mismatched feature type on observers' gaze and subjective realism scores are studied. Gaze metrics used include: fixation count, fixation duration, time and duration of first fixation on target object, as well...
In this paper, we propose a computational strategy to enhance the performance of Image Quality Metrics (IQM) by using content specific features of an image. We do this by creating Visual Error Importance (VEI) map that is applied to the error maps computed by the IQM. A global optimization can be used to compute the VEI map that is optimal for any given IQM. We demonstrate this concept by categorizing...
We present a novel deep learning architecture for fusing static multi-exposure images. Current multi-exposure fusion (MEF) approaches use hand-crafted features to fuse input sequence. However, the weak hand-crafted representations are not robust to varying input conditions. Moreover, they perform poorly for extreme exposure image pairs. Thus, it is highly desirable to have a method that is robust...
Visualization of the micro video big data refers to the intuitive display of the obtained data on micro videos across the Internet for the purpose of helping users to understand the message in the data. This paper describes the implementation of a micro video big data visualization system in detail, which has four steps: determine the visualization objective, choose data based on the objective, display...
Person re-identification has received considerable attention in the image processing, computer vision and pattern recognition communities because of its huge potential for video-based surveillance applications and the challenges it presents due to illumination, pose and viewpoint changes among non-overlapping cameras. Being different from the widely used low-level descriptors, visual attributes (e...
Estimating the initial background of a scene is a key prerequisite for several applications in video analytics. In this paper, we present a simple approach that takes into account spatio-temporal motion intensities while estimating the true background. We tested the algorithm on real video sequences from the Scene Background Initialization (SBI) benchmark dataset, and the results show that the algorithm...
Depth map estimation forms an integral part of many applications such as 2D-to-3D creation. There exists various methods in literature for depth map estimation using different cues and structure. Usually, depth information is decoded from these cues at the edges and matting is applied to spread it over neighboring regions. Defocus is one such cue due to its natural existence and does not require any...
Superpixel segmentation of 2D image has been widely used in many computer vision tasks. However, limited to the Gaussian imaging principle, there is not a thorough segmentation solution to the ambiguity in defocus and occlusion boundary areas. In this paper, we consider the essential element of image pixel, i.e., rays in the light space and propose light field superpixel (LFSP) segmentation to eliminate...
Re-identification of people in surveillance footage must cope with drastic variations in color, background, viewing angle and a persons pose. Supervised techniques are often the most effective, but require extensive annotation which is infeasible for large camera networks. Unlike previous supervised learning approaches that require hundreds of annotated subjects, we learn a metric using a novel one-shot...
We present a novel, purely affinity-based natural image matting algorithm. Our method relies on carefully defined pixel-to-pixel connections that enable effective use of information available in the image and the trimap. We control the information flow from the known-opacity regions into the unknown region, as well as within the unknown region itself, by utilizing multiple definitions of pixel affinities...
In this paper, we propose a consistent-aware deep learning (CADL) framework for person re-identification in a camera network. Unlike most existing person re-identification methods which identify whether two body images are from the same person, our approach aims to obtain the maximal correct matches for the whole camera network. Different from recently proposed camera network based re-identification...
In this paper, a novel saliency map generation approach based on the saccade target theory is proposed. A probabilistic model of transsaccadic integration is built based on four cues that influence human visual attention: foveaperiphery resolution discrepancy, visual memory, oculomotor bias and inhibition of return (IOR), where visual memory is formulated as combination of the visual short-term memory...
Facial attractiveness computation is a challenging task because of the lack of labeled data and discriminative features. In this paper, an end-to-end label distribution learning (LDL) framework with deep convolutional neural network (CNN) and geometric features is proposed to meet these two challenges. Different from the previous work, we recast this task as an LDL problem. Compared with the single...
Segmentation of 3D colored point clouds is a research field with renewed interest thanks to recent availability of inexpensive consumer RGB-D cameras and its importance as an unavoidable low-level step in many robotic applications. However, 3D data's nature makes the task challenging and, thus, many different techniques are being proposed, all of which require expensive computational costs. This paper...
We explore a quantitative assessment for a Microsoft Kinect-based stroke rehabilitation virtual reality (VR) video game, Mystic Isle, by evaluating three assessment metrics of player hand movement– maximum range (extension), peak velocity and mean velocity. We also analyze the left-right hand symmetry by visualizing trajectories of both hands throughout the game. Assessment metrics obtained by the...
Fine-grained visual recognition aims to capture discriminative characteristics amongst visually similar categories. The state-of-the-art research work has significantly improved the fine-grained recognition performance by deep metric learning using triplet network. However, the impact of intra-category variance on the performance of recognition and robust feature representation has not been well studied...
Multiview video plus depth (MVD) is the most popular 3D video format where the texture images contain the color information and the depth maps represent the geometry of the scene. The depth maps are exploited to obtain intermediate views to enable 3D-TV and free-viewpoint applications using the depth image based rendering (DIBR) techniques. DIBR is used to get an estimate of the intermediate views...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.