The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The recognition of places by using visual information in underwater environments is important when performing autonomous robotic exploration of the same area at different periods of time. It helps the robot to know its location and take decisions accordingly. However, vision-based recognition of underwater places can be a very challenging task due to the inherent properties of this kind of places...
The task of visual relationship recognition (VRR) is recognizing multiple objects and their relationships in an image. A fundamental difficulty of this task is class-number scalability, since the number of possible relationships we need to consider causes combinatorial explosion. Another difficulty of this task is modeling how to avoid outputting semantically redundant relationships. To overcome these...
A visually impaired person or a person with visually impaired, daily has difficulties to learn to recognize or differentiate objects when performing any activity, such as walking the streets or being able to recognize dollar bills of different denomination, this makes that the people with this type of disability cannot adapt easily to the society. Usually the visually impaired persons depend on someone...
We present an application of a Multiple Instance Learning (MIL) approach to image classification. In particular we focus on a recent MIL method for binary classification where the objective is to discriminate between positive and negative sets of points. Such sets are called bags and the points inside the bags are called instances. In the case of two classes of instances (positive and negative), a...
Many of the existing methods for learning joint embedding of images and text use only supervised information from paired images and its textual attributes. Taking advantage of the recent success of unsupervised learning in deep neural networks, we propose an end-to-end learning framework that is able to extract more robust multi-modal representations across domains. The proposed method combines representation...
Since the beginning of early civilizations, social relationships derived from each individual fundamentally form the basis of social structure in our daily life. In the computer vision literature, much progress has been made in scene understanding, such as object detection and scene parsing. Recent research focuses on the relationship between objects based on its functionality and geometrical relations...
Object detection, scene graph generation and region captioning, which are three scene understanding tasks at different semantic levels, are tied together: scene graphs are generated on top of objects detected in an image with their pairwise relationship predicted, while region captioning gives a language description of the objects, their attributes, relations and other context information. In this...
Engineering students conceptualize problems in diverse ways depending how the problems are presented. In this study, we investigate how different representations of problems, such as with images and sketches versus traditional word description of problems, allow students to recall information. Some students experience difficulties visualizing a concept when given a word problem while others do not...
The contribution of this paper is to bridge the gap on understanding the mathematical structure and the computational implementation of a convolutional neural network using a minimal model. The proposed minimal convolutional neural network is presented using a layering approach. This approach provides a clear understanding of the main mathematical operations in a convolutional neural network. Hence,...
Breast cancer (BC) is a deadly disease, killing millions of people every year. Developing automated malignant BC detection system applied on patient's imagery can help dealing with this problem more efficiently, making diagnosis more scalable and less prone to errors. Not less importantly, such kind of research can be extended to other types of cancer, making even more impact to help saving lives...
Deep learning has brought a series of breakthroughs in image processing. Specifically, there are significant improvements in the application of food image classification using deep learning techniques. However, very little work has been studied for the classification of food ingredients. Therefore, this paper proposes a new framework, called DeepFood which not only extracts rich and effective features...
An important goal of computer vision is to build systems that learn visual representations over time that can be applied to many tasks. In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multitask learning. In particular, the task of visual recognition is aligned to the task of visual question answering...
Real-world image recognition systems need to recognize tens of thousands of classes that constitute a plethora of visual concepts. The traditional approach of annotating thousands of images per class for training is infeasible in such a scenario, prompting the use of webly supervised data. This paper explores the training of image-recognition systems on large numbers of images and associated user...
We propose an attentive local feature descriptor suitable for large-scale image retrieval, referred to as DELE (DEep Local Feature). The new feature is based on convolutional neural networks, which are trained only with image-level annotations on a landmark image dataset. To identify semantically useful local features for image retrieval, we also propose an attention mechanism for key point selection,...
Pedestrian analysis plays a vital role in intelligent video surveillance and is a key component for security-centric computer vision systems. Despite that the convolutional neural networks are remarkable in learning discriminative features from images, the learning of comprehensive features of pedestrians for fine-grained tasks remains an open problem. In this study, we propose a new attentionbased...
In this work we propose a novel framework named Dual-Net aiming at learning more accurate representation for image recognition. Here two parallel neural networks are coordinated to learn complementary features and thus a wider network is constructed. Specifically, we logically divide an end-to-end deep convolutional neural network into two functional parts, i.e., feature extractor and image classifier...
Recognizing how objects interact with each other is a crucial task in visual recognition. If we define the context of the interaction to be the objects involved, then most current methods can be categorized as either: (i) training a single classifier on the combination of the interaction and its context; or (ii) aiming to recognize the interaction independently of its explicit context. Both methods...
The aim of this paper is to investigate the use of oculography signals for the recognition of experts in visual arts. We focused our attention on the number of sight transitions between characteristic image areas (ROIs). In the experiments we used oculographic data recorded at the Department of Experimental Psychology at the Catholic University of Lublin for 29 images and 34 users. The EM method was...
We consider the problem of fine-grained physical object recognition and introduce a dataset PharmaPack containing 1000 unique pharma packages enrolled in a controlled environment using consumer mobile phones as well as several recognition sets representing various scenarios. For performance evaluation, we extract two types of recently proposed local feature descriptors and aggregate them using popular...
This paper presents a novel method using accelerated KAZE (AKAZE) and Gist for a context-based semantic classification and recognition of indoor scenes used for a vision-based mobile robot. Our method represents spatial relations among categories for mapping neighborhood units on category maps using counter propagation networks (CPNs) while maintaining sequential information of labels generated from...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.