The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Multimodal sentiment analysis is drawing an increasing amount of attention these days. It enables mining of opinions in video reviews which are now available aplenty on online platforms. However, multimodal sentiment analysis has only a few high-quality data sets annotated for training machine learning algorithms. These limited resources restrict the generalizability of models, where, for example,...
Politicians in the same political party often share the same views on social issues and legislative agendas. By mining patterns in TV news co-appearances and Twitter followers, in this paper we estimate political leanings (left / right) of unknown individuals, and detect outlier politicians who have views different from their colleagues in the same party, from a graph signal processing (GSP) perspective...
Learning-based face super-resolution approaches rely on representative dictionary as self-similarity prior from training samples to estimate the relationship between the low-resolution (LR) and high-resolution (HR) image patches. The most popular approaches, learn mapping function directly from LR patches to HR ones but neglects the multi-layered nature of image degradation process (resolution down-sampling)...
We present a deep convolutional recurrent neural network for speech emotion recognition based on the log-Mel filterbank energies, where the convolutional layers are responsible for the discriminative feature learning. Based on the hypothesis that a better understanding of the internal configuration within an utterance would help reduce misclassification, we further propose a convolutional attention...
The Microsoft Kinect sensor has been widely used in many applications, but it suffers from the drawback of low depth accuracy. In this paper, we present a unified depth modification model to improve the Kinect depth accuracy by registering depth and color images in an iterative manner. Specifically, in each iteration, we first establish a coarse correspondence based on the feature descriptor of the...
The latest High Efficiency Video Coding (HEVC) has been increasingly used to generate video streams over Internet. However, the decoded HEVC video streams may incur severe quality degradation, especially at low bit-rates. Thus, it is necessary to enhance visual quality of HEVC videos at the decoder side. To this end, we propose in this paper a Decoder-side Scalable Convolutional Neural Network (DS-CNN)...
Predicting interestingness of media content remains an important, but challenging research subject. The difficulty comes first from the fact that, besides being a high-level semantic concept, interestingness is highly subjective and its global definition has not been agreed yet. This paper presents the use of up-to-date deep learning techniques for solving the task. We perform experiments with both...
HTTP adaptive streaming (HAS) has become the de-facto standard for video streaming to ensure continuous multimedia service delivery under irregularly changing network conditions. Many studies already investigated the detrimental impact of various playback characteristics on the Quality of Experience of end users, such as initial loading, stalling or quality variations. However, dedicated studies tackling...
This paper aims at task-oriented action prediction, i.e., predicting a sequence of actions towards accomplishing a specific task under a certain scene, which is a new problem in computer vision research. The main challenges lie in how to model task-specific knowledge and integrate it in the learning procedure. In this work, we propose to train a recurrent longshort term memory (LSTM) network for handling...
In this paper, a large-scale underwater image database for underwater salient object detection or saliency detection is presented in detail. This database is called the OUC-VISION underwater image database, which contains 4400 underwater images of 220 individual objects. Each object is captured with four pose variations (the frontal-, the opposite-, the left-, and the right-views of each underwater...
The High Efficiency Video Coding (HEVC) standard significantly saves coding bit-rate over the proceeding H.264 standard, but at the expense of extremely high encoding complexity. In fact, the coding tree unit (CTU) partition consumes a large proportion of HEVC encoding complexity, due to the brute-force search for rate-distortion optimization (RDO). Therefore, we propose in this paper a complexity...
Low rank matrix approximation, in the presence of missing data and outliers, has previously shown its significance as a theoretic foundation in a wide spectrum of tabulated information processing applications. To fit low rank models, minimizing the nuclear norm of matrices is a popular scheme, the computational load of which, however, is heavy. While bilinear factorization can largely mitigate the...
We present a new method of estimating disparity maps from stereo videos for bokeh effect synthesis. In this work, we develop an improved total variation regularization and the robust L1 norm in the data fidelity term (TV-L1) [4] based method to estimate edge-preserving disparity map without stereo rectification. The proposed algorithm improves the TV-L1 approach by incorporating structure edge detection,...
Dimensional emotion recognition from physiological signals is a highly challenging task. Common methods rely on hand-crafted features that do not yet provide the performance necessary for real-life application. In this work, we exploit a series of convolutional and recurrent neural networks to predict affect from physiological signals, such as electrocardiogram and electrodermal activity, directly...
The goal of image quality assessment (IQA) is to use computational models to measure the consistency between image quality and subjective evaluations. In recent years, convolutional neural networks (CNNs) have been widely used in image processing community and have achieved performance leaps than non CNNs-based methods. In this work, we describe an accurate deep CNNs model for no-reference IQA. Taking...
In this paper, we propose a novel multi-center convolutional neural network for unconstrained face alignment. To utilize structural correlations among different facial landmarks, we determine several clusters based on their spatial position. We pre-train our network to learn generic feature representations. We further fine-tune the pre-trained model to emphasize on locating a certain cluster of landmarks...
The ever widening application of virtual reality requires the ultra high resolution omnidirectional videos (OVs) to be transmitted over the wired and wireless Internet at low cost (i.e. bitrate). Various solutions have been proposed to intelligently reduce the bitrate, e.g. adapting the spatial resolution of the video for different directions of the panorama with regard to current direction that the...
The deployments of deep neural network models on mobile or embedded devices have been challenged due to two main reasons: 1) the large model size for storage, and 2) the large memory bandwidth for inference. To address these issues, this paper develops a deep neural network compression framework to reduce the resource usage for efficient visual inference. By reviewing the trained deep model, we propose...
Visual tracking is a fundamental problem in computer vision. Recently, some deep-learning-based tracking algorithms have been achieving record-breaking performances. However, due to the high complexity of deep learning, most deep trackers suffer from low tracking speed, and thus are impractical in many real-world applications. Some new deep trackers with smaller network structure achieve high efficiency...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.