The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The power of modern image matching approaches is still fundamentally limited by the abrupt scale changes in images. In this paper, we propose a scale-invariant image matching approach to tackling the very large scale variation of views. Drawing inspiration from the scale space theory, we start with encoding the image’s scale space into a compact multi-scale representation. Then, rather than trying...
Person re-identification is best known as the problem of associating a single person that is observed from one or more disjoint cameras. The existing literature has mainly addressed such an issue, neglecting the fact that people usually move in groups, like in crowded scenarios. We believe that the additional information carried by neighboring individuals provides a relevant visual context that can...
For large-scale visual search, highly compressed yet meaningful representations of images are essential. Structured vector quantizers based on product quantization and its variants are usually employed to achieve such compression while minimizing the loss of accuracy. Yet, unlike binary hashing schemes, these unsupervised methods have not yet benefited from the supervision, end-to-end learning and...
Artificial intelligence is widely used in image processing. Neural networks (NN) were successful used for solving complicated issues due to their capacity of generalization and learning from examples. In this paper some aspects of image compression using artificial neural networks are discussed. The network is used in the feedback loop of the visual servoing system, which aims to control a wheeled...
A node in wireless sensor networks has limited battery capacity. In this case, energy efficiency is crucial to prolong the sensor devices lifetime. In this paper, we propose a simple and energy efficient image compression (SEIC), based on Discrete Cosine Transform (DCT) transform, in addition to our previous proposed method based on Discrete Wavelet Transform (SEIC-DWT). SEIC (DCT or DWT) consists...
The WRGB-OLED with larger-sized display resolution can bring us more colorful and better visual experiences. However, it also makes OLED display system suffer from a serious bottleneck on memory bandwidth. In this paper, the lossless pixel-gradient EC algorithm is proposed to overcome this bottleneck. It consists of two core techniques: Finer-Gradient-Based Prediction (FGBP) and Gradient-Based Golomb-Rice...
Visual attention has been successfully applied in structural prediction tasks such as visual captioning and question answering. Existing visual attention models are generally spatial, i.e., the attention is modeled as spatial probabilities that re-weight the last conv-layer feature map of a CNN encoding an input image. However, we argue that such spatial attention does not necessarily conform to the...
Traffic scene recognition is an important and challenging issue in Intelligent Transportation Systems (ITS). Recently, Convolutional Neural Network (CNN) models have achieved great success in many applications, including scene classification. The remarkable representational learning capability of CNN remains to be further explored for solving real-world problems. Vector of Locally Aggregated Descriptors...
360° video, supporting the ability to present views consistent with the rotation of the viewer's head along three axes (roll, pitch, yaw) is the current approach for creation of immersive video experiences. Nevertheless, a more fully natural, photorealistic experience — with support of visual cues that facilitate coherent psycho-visual sensory fusion without the side-effect of cyber-sickness — is...
Convolutional neural networks (CNNs) have been widely used in image processing community. Image deblocking is a post-processing strategy, which aims to reduce the visually annoying blocking artifacts that are caused by block-based transform coding at low bit rates. In recent years, CNNs based methods have been proposed to solve this classic image processing problem. In this paper, we present an efficient...
This paper investigates the usability of Halftoning-based Block Truncation Coding (HBTC) feature for image retrieval. It assumes that all images in database are stored in scrambled/encrypted format. Firstly, an image feature descriptor is derived from the scrambled/encrypted image. This image feature is subsequently converted into the binary representation to achieve fast similarity measurement. The...
The deployments of deep neural network models on mobile or embedded devices have been hindered due to their large number of weights. In this work, we develop a deep neural network (DNN) model compression service termed MicroBrain to reduce the resource usage for energy-efficient visual inference. By automatically analyzing the trained DNN models, we propose a high-performance DNN model compression...
The recent advances in light field imaging are changing the way in which visual content is captured, processed and consumed. Storage and delivery systems for light field images rely on efficient compression algorithms. Such algorithms must additionally take into account the feature-rich rendering for light field content. Therefore, a proper evaluation of visual quality is essential to design and improve...
Robotic agents, when not equipped with traditional means to capture information about their surroundings, must autonomously learn to extract this information from a very complex environment. In the context of developmental robotics, we use unsupervised representation learning, and more specifically deep autoencoders, in order to capture visual representations. These generic visual representations...
The Bag-of-Visual-Words (BoVW) approach has been attracted some attention in the field of keyword spotting. However, the BoVW approach discards the spatial relations of the visual words. Therefore, a visual language model is integrated into the BoVW framework in this study so as to add the spatial information. To accomplish the process of keyword spotting, two well-known retrieval schemes, including...
In this work, a clustering approach to obtain compact topological models of an environment is developed and evaluated. The usefulness of these models is tested by studying their utility to solve the robot localization problem subsequently. Omnidirectional visual information and global appearance descriptors are used both to create and compress the models and to estimate the position of the robot....
Lossy image compression algorithms are pervasively used to reduce the size of images transmitted over the web and recorded on data storage media. However, we pay for their high compression rate with visual artifacts degrading the user experience. Deep convolutional neural networks have become a widespread tool to address high-level computer vision tasks very successfully. Recently, they have found...
In this work, a novel bit allocation method based on visual attention and distortion sensitivity is developed for JPEG2000. Although, visual attention map for an image can be measured by using well-known saliency map methods, true visual attention map can be obtained by conducting experiments to determine fixation points and their durations. A perception model might turn these duration of fixations...
This study investigated a novel method of evaluating visually lossy images based on saccadic eye movements. In each trial, participants with normal vision were asked to indicate any visible changes in the image while their gaze positions were being monitored. The original image was replaced with compressed or blurred versions of the same image 150ms after the onset of each eye fixation, and the parameters...
Omnidirectional image and video have gained popularity thanks to availability of capture and display devices for this type of content. Recent studies have assessed performance of objective metrics in predicting visual quality of omnidirectional content. These metrics, however, have not been rigorously validated by comparing their prediction results with ground-truth subjective scores. In this paper,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.