The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Depression detection using speech signal is becoming an attractive topic because it is fast, convenient and non-invasive. Many researches aimed at improving depression classification performance. This study investigated application of ensemble learners in depression detection and compared three speaking styles (interview, reading and picture description) in ensembles. A speech dataset collecting from...
For the sake of improving the precision of speech emotion recognition, this paper proposed a novel speech emotion recognition approach based on Gaussian Kernel Nonlinear Proximal Support Vector Machine (PSVM) to recognize four basic human emotions (angry, joy, sadness, surprise). Firstly, preprocess speech signal containing sampling, quantification, pre-emphasizing, framing, adding window and endpoint...
Ste gano graphic systems are used for the transmission of hidden data in the original signal. The article describes the algorithm of the hidden data transmission using the speech signal as a carrier. The echo method is used for data embedding. In order to improve the decoding efficiency of embedded data, the procedure of voicing correction and mechanism of informed coding were developed and implemented...
In this paper, we present a novel spectrum mapping method — Continuous Frequency Warping and Magnitude Scaling (CFWMS) for voice conversion under the Joint Density Gaussian Mixture Model (JDGMM) framework. JDGMM is a mature clustering technique that models the joint probability density of speech signals from paired speakers. The conventional JDGMM-based approaches morph the spectral features via least...
Speech emotion recognition is a challenging problem, with identifying efficient features being of particular concern. This paper has two components. First, it presents an empirical study that evaluated four feature reduction methods, chi-square, gain ratio, RELIEF-F, and kernel principal component analysis (KPCA), on utterance level using a support vector machine (SVM) as a classifier. KPCA had the...
The kernel function plays an important role in the classification of support vector machines (SVM). In order to solve the problem that a single SVM kernel function can not achieve optimal learning ability and generalization ability in recognition classification at the same time, here we present a new combined kernel function by analyzing and comparing the characteristics of various kernel functions...
Accurately recognizing speaker emotion and age/gender from speech can provide better user experience for many spoken dialogue systems. In this study, we propose to use deep neural networks (DNNs) to encode each utterance into a fixed-length vector by pooling the activations of the last hidden layer over time. The feature encoding process is designed to be jointly trained with the utterance-level classifier...
Developing cross-corpus, cross-domain, and cross-language emotion recognition algorithm has becoming more prevalent recently to ensure the wide applicability of robust emotion recognizer. In this work, we propose a computational framework on fusing multiple emotion perspectives by integrating cross-lingual emotion information. By assuming that each data is ‘perceived’ not only by a main perspective...
This work presents a novel approach for audio event recognition. The approach develops a weighted kernel fisher sparse analysis method based on multiple maps. The proposed method consists of maps extraction and kernel weighted Fisher sparse analysis. Two maps are firstly extracted from each audio file, i.e. scale-frequency map and damping-frequency map. The scale and frequency of the Gabor atoms are...
Classification of human behavior is a key step to developing closed-loop Deep Brain Stimulation (DBS) systems, which may decrease the power consumption and side effects of the existing systems. Recent studies have shown that the Local Field Potential (LFP) signals from both Subthalamic Nuclei (STN) of the brain can be used to recognize human behavior. Since the DBS leads implanted in each STN can...
The Kernel Additive Modeling (KAM) is a recent promising framework for the separation of underdetermined convolutive mixture of audio signal. The principle of this method is to estimate the short term Power Spectral Densities (PSD) of the sources directly from the mixture by taking advantage of redundant features in the PSD of the source, such as periodicity or smoothness. The separation itself is...
The use of non-verbal vocal input (NVVI) as a hand-free trigger approach has proven to be valuable in previous work [7]. Nevertheless, BlowClick's original detection method is vulnerable to false positives and, thus, is limited in its potential use, e.g., together with acoustic feedback for the trigger. Therefore, we extend the existing approach by adding common machine learning methods. We found...
Parkinson disease has become a serious problem in the old people. There is no precise method to diagnosis Parkinson disease now. Considering the significance and difficulty of recognizing the Parkinson disease, the measurement of samples' voices is regard as one of the best non-invasive ways to find the real patient. Support Vector Machine is one of the most effective tools to classify in machine...
I-vector space feature has been recently proved to be very efficient in speaker recognition field. In this paper, we assess using the i-vector approach for emotional speaker recognition in order to boost the performance which is deteriorated by emotions. The key idea of the i-vector algorithm is to represent each speaker by a fixed length and low dimensional feature vector. The concatenation of these...
In this paper, we focus on the classification of neutral and stressed speech. The parameters representing airflow patterns in physiological system are achieved using a physical model. Speech features were modeled using Gaussian Mixture Models (GMM) and Support Vector Machines (SVM). A comparison is made of different classifiers to determine their performance in stressed speech classification. Results...
This paper proposes a speech/music classification system based on i-vector. An analysis of two classification methods, namely cosine distance score (CDS) and support vector machine (SVM) is performed. Two session compensation methods, within-class covariance normalization (WCCN) and linear discriminant analysis (LDA) are also discussed. The performance of proposed systems yields better results compared...
Users interact with mobile apps with certain intents such as finding a restaurant. Some intents and their corresponding activities are complex and may involve multiple apps; for example, a restaurant app, a messenger app and a calendar app may be needed to plan a dinner with friends. However, activities may be quite personal and third-party developers would not be building apps to specifically handle...
Does a hearing-impaired individual's speech reflect his hearing loss, and if it does, can the nature of hearing loss be inferred from his speech? To investigate these questions, at least four hours of speech data were recorded from each of 37 adult individuals, both male and female, belonging to four classes: 7 normal, and 30 severely-to-profoundly hearing impaired with high, medium or low speech...
Given the increasing attention paid to speech emotion classification in recent years, this work presents a novel speech emotion classification approach based on the multiple kernel Gaussian process. Two major aspects of a classification problem that play an important role in classification accuracy are addressed, i.e. feature extraction and classification. Prosodic features and other features widely...
Speech emotion recognition is a challenging and significant task. On the one hand, the emotion features need to be robust enough to capture the emotion information, and while on the other, machine learning algorithms need to be insensitive to model the utterance. In this paper, we presented a novel framework of speech emotion recognition to address the two above-mentioned challenges. Relative Entropy...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.