The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper suggested a technique based on MFCC analysis for audio signals with speech classification application. The proposed work used multi-resolution (wavelet) analysis and spectral analysis based features for feature extraction. The proposed approach uses a no. of features like Mel Frequency Cepstral Coefficient (MFCC), and FFT Coefficients combined with wavelet based features. In addition, accuracy...
This paper aimed at introducing a completely automated Arabic phone recognition system based on Enhanced Wavelet Packets Best Tree Encoding (EWPBTE) 15-point speech feature. The process of enhancing of WPBTE is provided by adding energy component to WPBTE, which is implemented in Matlab software and makes an enhancement of 65 % to recognizer accuracy which is the most contribution in this paper. EWPBTE...
Use of modern technological advances in real-time biomedical analysis is very crucial. Current work focuses on glottal pathology discrimination based on non-invasive speech analysis techniques. Primary set back in developing such method is irregular performance depreciation of several state of the art acoustic features. To excuse such problems, we have used glottal to noise excitation ratio, which...
This paper presents the experiments on feature selection for emotional speech classification. There are 152 features used in this experiment. The minimum redundancy maximum relevance (mRMR) feature selection is applied as the features selection. The experiments are constructed from two corpora; Interactive Emotional Dyadic Motion Capture (IEMOCAP) and Emotional Tagged Corpus on Lakorn (EMOLA) which...
Speech recognition is widely applied to speech to text, speech to emotion, in order to make gadget and computer easier to use, or to help people with hearing disability. Feature extraction is one of significant step in the performance of speech recognition. Therefore, the proper selection is really needed. In this paper, we analyze feature extraction that can have good performance for Indonesian speech...
The most successful approach to speech and speaker recognition is to treat the speech signal as a stochastic pattern and to use a statistical pattern recognition technique for matching utterances. This paper attempts to study the performance of Text dependent speaker verification system using Delta-Delta Mel Frequency Cepstral Coefficients (MFCC-Δ-Δ) feature vector and Fuzzy C means (FCM) speaker...
The current benchmark speech-based depression detection techniques rely on acoustic speech parameters collected from large sets of representative speech recordings. This study for the first time investigates depression detection based on the higher order influence model (HOIM) coefficients and emotional transition parameters derived from a relatively small set of conversational speech recordings representing...
The robustness of speech recognizers towards noise can be increased by normalizing the statistical moments of the Mel-frequency cepstral coefficients (MFCCs), e. g. by using cepstral mean normalization (CMN) or cepstral mean and variance normalization (CMVN). The necessary statistics are estimated over a long time window and often, a complete utterance is chosen. Consequently, changes in the background...
For many years, filterbanks have been widely used as one step of frontend feature extraction for Automatic Speech Recognition (ASR). In this paper, we propose a unified framework for ASR frontends, by first moving the nonlinear amplitude scaling, and then combining the filterbank weights with the cosine basis vectors. As part of this framework, we also show that the delta terms used to encode feature...
Persuasive communication and interaction play an important and pervasive role in many aspects of our lives. With the rapid growth of social multimedia websites such as YouTube, it has become more important and useful to understand persuasiveness in the context of online social multimedia content. In this paper, we present our results of conducting various analyses of persuasiveness in speech with...
This paper studies the effectiveness of speech contents for detecting clinical depression in adolescents. We also evaluated the performances of acoustic features such as Mel frequency cepstral coefficients (MFCC), short time energy (Energy), zero crossing rate (ZCR) and Teager energy operator (TEO) using Gaussian mixture models for depression detection. A clinical data set of speech from 139 adolescents,...
A whispered speech resembles an unvoiced speech due to the lack of vocal fold vibration unlike the neutral speech. Since information about the gender of a speaker typically lies in the pitch resulted from the vocal fold vibration (or source signal), identifying gender from the whispered speech is more challenging compared to that from the neutral speech. In the absence of the pitch, we study the use...
This paper presents a speaker based Language Independent Isolated Speech Recognition System (LIISRS). The most popular feature extraction technique Mel Frequency Cepstral Coefficients (MFCC) is used for training the system. Representative specific features are identified using K-Means algorithm. Distortion measure is calculated using Euclidian distance function. Pitch contour characteristics are used...
This paper motivates the use of combination of mel frequency cepstral coefficients (MFCC) and its delta derivatives (DMFCC and DDMFCC) calculated using mel spaced Gaussian filter banks for text independent speaker recognition. MFCC modeled on the human auditory system shows robustness against noise and session changes and hence has become synonymous with speaker recognition. Our main aim is to test...
The goal of this work is to improve phone recognition accuracy using combination of source and system features. As speech is produced by exciting time varying vocal tract system with time varying excitation, we want to explore both source and system components of speech production system for phone recognition. The excitation source information is derived by processing linear prediction residual of...
Human computer interaction with the time has extended its branches to many different other fields like engineering, cognition, medical etc. Speech analysis has also become an important area of concern. People involved are using this mode for the interaction with the machines to bridge the gap between physical and digital world. Speech emotion recognition has become an integral subfield in the domain...
This paper presents the process of Quranic Accent Automatic Identification. Recent feature extraction technique that is used for Quranic verse rule identification/Tajweed include Mel Frequency Cepstral Coefficients (MFCC) which prone to additive noise and may reduce the classification result. Therefore, to improve the performance of MFCC with addition of Spectral Centroid features and is proposed...
This paper presents the work of acoustic analysis related to Modern Standard Arabic (MSA). The problem of classifying the consonant counterparts in MSA is tackled here. The study considers four phonemes: /dˤ, ðˤ/ and their non-emphatic counterparts /d, ð/ respectively. An accurate automatic classification for those phonemes is to be achieved. Artificial neural networks (ANNs) are used for that purpose...
This paper revealed the analysis of speaker independent isolated Pashto spoken numbers for determination of automatic speech recognition. Initially the database was developed, the database encompasses isolated Pashto numbers from sefer (0) to sul (100). Fifty speakers (25 male, 25 females with different ages) that can frequently speak yousafzai dialect were selected for recording. The recording has...
We present in this paper a new Direct Access Framework (DAF) for speaker identification system, to identify a speaker based on original characteristics of the human voice. Direct access method is a process to identify an object based on parts of the object itself, the parts called original characteristics. The proposed framework consists of two parts, the enrolment process and the identification process...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.