The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The recognition of places by using visual information in underwater environments is important when performing autonomous robotic exploration of the same area at different periods of time. It helps the robot to know its location and take decisions accordingly. However, vision-based recognition of underwater places can be a very challenging task due to the inherent properties of this kind of places...
Since news videos are valuable sources of multimedia information on real-world events, there is a demand for viewing them efficiently. However, there is a problem that summarization methods based on auditory contents do not take into account the visual contents. In the case of news videos, due to its presentation style where audio contents and visual contents do not necessarily come from the same...
Blockly is an open source library that makes it easy to add block based visual programming to an app. It is designed to be flexible and supports a large set of features for different applications. It has been used for programming animated characters on a screen; creating story scripts; controlling robots; and even generating legal documents. But Blockly is not itself a language; developers who use...
Recognizing arbitrary objects in the wild has been a challenging problem due to the limitations of existing classification models and datasets. In this paper, we propose a new task that aims at parsing scenes with a large and open vocabulary, and several evaluation metrics are explored for this problem. Our approach is a joint image pixel and word concept embeddings framework, where word concepts...
As study on information recommendation suitable for individuals, there are many study consideration of personal preference, but few studies consider the degree of specialization. Therefore, in this study, we aim to propose information recommendation method suitable for personal degree of specialization. In this paper, as the first stage, we will clarify the difference between the viewpoint of experts...
Semantic gap, which is the difference between low-level image features and their high-level semantics, has become very popular and witnessed great interest in the last two decades. This paper deals with this problem and proposes a hybrid approach to learn image semantic concepts for modeling visual features in discriminative learning stage. It combines the advantages of human-in-the-loop and discriminative...
With the tremendous advances made by Convolutional Neural Networks (ConvNets) on object recognition, we can now easily obtain adequately reliable machine-labeled annotations easily from predictions by off-the-shelf ConvNets. In this work, we present an abstraction memory based framework for few-shot learning, building upon machine-labeled image annotations. Our method takes large-scale machine-annotated...
This paper focuses on a novel and challenging vision task, dense video captioning, which aims to automatically describe a video clip with multiple informative and diverse caption sentences. The proposed method is trained without explicit annotation of fine-grained sentence to video region-sequence correspondence, but is only based on weak video-level sentence annotations. It differs from existing...
The role of semantics in zero-shot learning is considered. The effectiveness of previous approaches is analyzed according to the form of supervision provided. While some learn semantics independently, others only supervise the semantic subspace explained by training classes. Thus, the former is able to constrain the whole space but lacks the ability to model semantic correlations. The latter addresses...
Attribute-based recognition models, due to their impressive performance and their ability to generalize well on novel categories, have been widely adopted for many computer vision applications. However, usually both the attribute vocabulary and the class-attribute associations have to be provided manually by domain experts or large number of annotators. This is very costly and not necessarily optimal...
We study the problem of answering questions about images in the harder setting, where the test questions and corresponding images contain novel objects, which were not queried about in the training data. Such setting is inevitable in real world–owing to the heavy tailed distribution of the visual categories, there would be some objects which would not be annotated in the train set. We show...
Recognition of humans' emotions may be crucial in certain applications involving e.g., human-computer interaction, monitoring of elderly, understanding the affective state of learners during a course etc. To this goal and depending on the application and the environment, one may use physiological parameters (e.g., heart rate, brain activity etc.) which are typically obtrusive, or analyze other modalities...
The sense of touch is probably the most complex human sense, because it involves a very large number of sensory receptors spread over the whole body, and takes at the same time full advantage of the human nervous system complexity and power. Although this complexity enables us to perceive the world around us and interact with it, it is also a great source of variability when it comes to controlling...
Our work builds upon Visual Teach & Repeat 2 (VT&R2): a vision-in-the-loop autonomous navigation system that enables the rapid construction of route networks, safely built through operator-controlled driving. Added routes can be followed autonomously using visual localization. To enable long-term operation that is robust to appearance change, its Multi-Experience Localization (MEL) leverages...
Image caption generation becomes a raising topic in computer vision and artificial intelligence. In order to solve the problem of stiff description, we intend to extract richer features using convolutional neural network (CNN). A neural and probabilistic framework has been proposed consequently which combines CNN with a special form of recurrent neural network (RNN) to produce an end-to-end image...
We present an illumination-robust visual localization algorithm for Astrobee, a free-flying robot designed to autonomously navigate on the International Space Station (ISS). Astrobee localizes with a monocular camera and a pre-built sparse map composed of natural visual features. Astrobee must perform tasks not only during the day, but also at night when the ISS lights are dimmed. However, the localization...
The shape and color of visual identity are the most important factors in the visualization of trademarks, which exert a far-reaching influence on the establishment of corporate image and brand image. Under the impact of globalization, brands have become a major factor in keeping a foothold in the consumer market, and sales are no longer limited to certain regions. Today's ideological trend of design...
Motivated by great performance gained by Recurrent neural network applied on machine translation, people began to pay attention to image describing with related deep learning methods. Recurrent neural network can not remember long term information but Long-Short Term Memory(LSTM) can handle this well. However, the LSTM applied on image describing to predict sentences in previous literature [1] can...
This paper presents experimental results about loop closure detection in mobile robots through spectral description of images set and data dimensionality reduction. Both, spectral description and representation in low dimension depend heavily on the concept of dominant eigenvector. Integration between Matlab and ROS interface was exploited to perform our experiments. Besides, two environments were...
In the task of product image search, the database consists of clean versions of product images, while the query photos are often captured from mobile phone cameras under uncontrolled conditions. Conventional methods usually adopt the SIFT based bag-of-words (BoW) representation of the whole query image, which suffers from the interference of background noise. To address the problem, we extract multiple...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.