The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper demonstrates the usage of Kvazaar open-source HEVC intra encoder in 4K real-time video encoding. In this setup, a raw 4K video is shot by an action camera, captured by an HDMI capture card, encoded in real-time by Kvazaar ultrafast preset on a 22-core Intel Xeon processor, sent to a laptop, and decoded by OpenHEVC decoder for playback. The encoding process is visualized on the fly by Kvazaar...
In this paper, we introduce a new hardware platform that mimics a compound eye of an insect and propose an algorithm to detect objects using it. The compound eye camera has a wide viewing angle and simulates a number of single eyes on its hemisphere. Each single eye is an elementary unit to acquire visual inputs. Visual information from single eyes is hierarchically merged to estimate objectness....
Person re-identification is best known as the problem of associating a single person that is observed from one or more disjoint cameras. The existing literature has mainly addressed such an issue, neglecting the fact that people usually move in groups, like in crowded scenarios. We believe that the additional information carried by neighboring individuals provides a relevant visual context that can...
Given a video and a description sentence with one missing word, “source sentence”, Video-Fill-In-the-Blank (VFIB) problem is to find the missing word automatically. The contextual information of the sentence, as well as visual cues from the video, are important to infer the missing word accurately. Since the source sentence is broken into two fragments: the sentence’s left fragment (before the blank)...
The deep convolutional neural network (CNN) is the state-of-the-art solution for large-scale visual recognition. Following some basic principles such as increasing network depth and constructing highway connections, researchers have manually designed a lot of fixed network architectures and verified their effectiveness.,,In this paper, we discuss the possibility of learning deep network structures...
Mid-level representations are used to map sets of local features into one global representation for a given media descriptor. In visual pattern recognition tasks, Bag-of-Words (BoW) is one popular strategy, among many methods available in literature, due mainly by the simplicity in concept and implementation. Despite the overall good results achieved by BoW in many tasks, the method is unstable in...
Peer code reviews are important for giving and receiving peer feedback, but the code review process is time consuming. Static analysis tools can help reduce reviewer effort by catching common mistakes prior to peer code review. Ideally, contributors would use static analysis tools prior to pull request submission so common mistakes could be addressed first, before invoking the reviewer. To explore...
In this study we provide our methodology and implementation strategy of Sketchnoting in Freshman Engineering and Technological Literacy classes. The objective is to improve students' learning, visualization, and communication proficiencies, as well as to foster advancement in knowledge retention, and critical thinking. This study provides the motivation, supporting research background, design, and...
For large-scale visual search, highly compressed yet meaningful representations of images are essential. Structured vector quantizers based on product quantization and its variants are usually employed to achieve such compression while minimizing the loss of accuracy. Yet, unlike binary hashing schemes, these unsupervised methods have not yet benefited from the supervision, end-to-end learning and...
The success of fine-grained visual categorization (FGVC) extremely relies on the modeling of appearance and interactions of various semantic parts. This makes FGVC very challenging because: (i) part annotation and detection require expert guidance and are very expensive; (ii) parts are of different sizes; and (iii) the part interactions are complex and of higher-order. To address these issues, we...
Blockly is an open source library that makes it easy to add block based visual programming to an app. It is designed to be flexible and supports a large set of features for different applications. It has been used for programming animated characters on a screen; creating story scripts; controlling robots; and even generating legal documents. But Blockly is not itself a language; developers who use...
Most recent CNN architectures use average pooling as a final feature encoding step. In the field of fine-grained recognition, however, recent global representations like bilinear pooling offer improved performance. In this paper, we generalize average and bilinear pooling to “α-pooling”, allowing for learning the pooling strategy during training. In addition, we present a novel way to visualize decisions...
This paper aims to develop an effective flower classification approach using the technology of feature extraction. With this regard, a fused descriptor based on Pyramid Histogram of Visual Words (PHOW) is used to extract the color, texture and contour information of flower image. Secondly, Dictionary Learning and Locality-constrained Linear Coding (LLC) are operated on PHOW feature and then images...
Facial composite technologies are used to produce visual resemblances of an offender. However, resemblances may be poor, particularly when composites are constructed using traditional ‘feature’ composite systems deployed several days after the crime. In this case a witness may have forgotten important details about an offender's appearance. Engaging in early and repeated retrieval attempts could potentially...
In this paper, we introduce a digital edition of the Altan Tobchi, a Mongolian historical manuscript written in traditional Mongolian script. The Text Encoding Initiative guidelines were adopted to encode the named entities, commentaries, transcriptions, and interpretations of ancient Mongolian words. Named entities such as personal names and place names were extracted from digitized text by employing...
The WRGB-OLED with larger-sized display resolution can bring us more colorful and better visual experiences. However, it also makes OLED display system suffer from a serious bottleneck on memory bandwidth. In this paper, the lossless pixel-gradient EC algorithm is proposed to overcome this bottleneck. It consists of two core techniques: Finer-Gradient-Based Prediction (FGBP) and Gradient-Based Golomb-Rice...
Modeling the activity of an ensemble of neurons can provide critical insights into the workings of the brain. In this work we examine if learning based signal modeling can contribute to a high quality modeling of neuronal signal data. To that end, we employ the sparse coding and dictionary learning schemes for capturing the behavior of neuronal responses into a small number of representative prototypical...
Depression is a cognitive impairment, which according to the World Health Organisation is the leading cause of disability worldwide. One key trait of depression is psychomotor retardation, which adversely affects both emotional and physical behaviour of an individual. In this paper we perform experiments on the Audio Visual Emotion recognition Challenge 2016 — Depression Classification sub-Challenge...
The use of Recurrent Neural Networks for video captioning has recently gained a lot of attention, since they can be used both to encode the input video and to generate the corresponding description. In this paper, we present a recurrent video encoding scheme which can discover and leverage the hierarchical structure of the video. Unlike the classical encoder-decoder approach, in which a video is encoded...
We propose a Deep Texture Encoding Network (Deep-TEN) with a novel Encoding Layer integrated on top of convolutional layers, which ports the entire dictionary learning and encoding pipeline into a single model. Current methods build from distinct components, using standard encoders with separate off-the-shelf features such as SIFT descriptors or pre-trained CNN features for material recognition. Our...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.