The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper suggested a technique based on MFCC analysis for audio signals with speech classification application. The proposed work used multi-resolution (wavelet) analysis and spectral analysis based features for feature extraction. The proposed approach uses a no. of features like Mel Frequency Cepstral Coefficient (MFCC), and FFT Coefficients combined with wavelet based features. In addition, accuracy...
Speech recognition is widely applied to speech to text, speech to emotion, in order to make gadget and computer easier to use, or to help people with hearing disability. Feature extraction is one of significant step in the performance of speech recognition. Therefore, the proper selection is really needed. In this paper, we analyze feature extraction that can have good performance for Indonesian speech...
The most successful approach to speech and speaker recognition is to treat the speech signal as a stochastic pattern and to use a statistical pattern recognition technique for matching utterances. This paper attempts to study the performance of Text dependent speaker verification system using Delta-Delta Mel Frequency Cepstral Coefficients (MFCC-Δ-Δ) feature vector and Fuzzy C means (FCM) speaker...
This paper motivates the use of combination of mel frequency cepstral coefficients (MFCC) and its delta derivatives (DMFCC and DDMFCC) calculated using mel spaced Gaussian filter banks for text independent speaker recognition. MFCC modeled on the human auditory system shows robustness against noise and session changes and hence has become synonymous with speaker recognition. Our main aim is to test...
Human listeners are capable of recognizing speech in noisy environment, while most of the traditional speech recognition methods do not perform well in the presence of noise. Unlike traditional Mel-frequency cepstral coefficient (MFCC)-based method, this study proposes a phoneme classification technique using the neural responses of a physiologically-based computational model of the auditory periphery...
This paper revealed the analysis of speaker independent isolated Pashto spoken numbers for determination of automatic speech recognition. Initially the database was developed, the database encompasses isolated Pashto numbers from sefer (0) to sul (100). Fifty speakers (25 male, 25 females with different ages) that can frequently speak yousafzai dialect were selected for recording. The recording has...
We present in this paper a new Direct Access Framework (DAF) for speaker identification system, to identify a speaker based on original characteristics of the human voice. Direct access method is a process to identify an object based on parts of the object itself, the parts called original characteristics. The proposed framework consists of two parts, the enrolment process and the identification process...
The use of Electroencephalography (EEG) in the domain of Brain Computer Interface is a now common place. EEG for imagined speech reproduction and observation of brain response to audio stimuli are active areas of research. In this paper, we consider the case of imagined and mouthed non-audible speech recorded with EEG electrodes. We analyze different feature extraction techniques such as Mel Frequency...
This paper proposes an emotion recognition system which allows recognizing a person's emotional state from speech signal. The aim of proposed solution is to improve the interaction among humans and computers. The emotion recognition system must be capable of recognizing at least six basic emotions (happiness, anger, surprise, disgust, fear, sadness) and the neutral circumstances. The proposed system...
The article presents the development of a speaker identification system as one part of the multimodal interface for the HBB-NEXT project. A short introduction to a speaker identification problem in the context of HBB-NEXT project is given. Then we focus on the design, optimization and method selection process in order to realize a real time, text independent speaker identification application, namely:...
India is a vast country with a large number of languages. Among these some languages descend from a single mother language giving rise to a language family. The major official languages in India fall under two language families namely Indo-European and Dravidian. In this paper, we have discussed about a system which takes speech file as input and identifies the language family to which it belongs...
In certain situations, speech might be shifted in the frequency domain amid the presence of noise. To be able to compensate for the spectral shift, it is important to know the amount of frequency shift present. A method based on Mel-frequency-cepstral-coefficient (MFCC) and Gaussian Mixture model (GMM) super vector is proposed for detecting frequency shifts in speech. MFCC or LFCC is extracted to...
In this paper we propose Fourier-Bessel cepstral coefficients (FBCC) features for robust speech recognition. The Fourier-Bessel representation of the speech signal is obtained using Bessel function as a basis set. The FBCC are extracted from zeroth order Bessel coefficients taking into account of the perceptual characteristics of human auditory system. Recognition accuracy is measured using the CMU...
Gaussian Mixture Models (GMMs) have been proven effective in modeling speech and other acoustic signals. In this study, we have used GMMs to model different noise sources, viz. subway, babble, car and exhibition. Expectation maximization algorithm has been implemented to fit the model. Further, we present the ‘threshold’ method which uses the energy coefficient of the Mel - Frequency Cepstral Coefficients...
This paper proposes a new feature extraction method called multi-directional local feature (MDLF) to apply on an automatic speaker recognition system. To obtain MDLF, a linear regression is applied on FFT signal in four different directions which are horizontal (time axis), vertical (frequency axis), diagonal 45 degree (time-frequency) and diagonal 135 degree (time-frequency). In the experiments,...
We present a new web-based application designed for human computer interface that currently supports speaker identification module. It is based on Java EE and Spring Framework and is designed to be invoked by users through their Internet browsers. Due to a flexible design various feature extraction methods, signal processing and classification algorithms can be easily implemented and used in different...
This paper presents the performance of feature extraction techniques for speech recognition, for the classification of speech represented by a particular continuous sentence model. The goal of this study is to present independent as well as comparative performances of popular appearance based feature extraction techniques i.e. Linear Discriminative Analysed and Mel Frequency Cestrum Coefficient. Mel...
The paper describes a novel method for discrete speech recognition based on spoken Arabic digit recognition by means of wavelet neural network in which Morlet wavelet is introduced to the hidden layer. The speech signal is extracted by means of Mel Frequency Cepstral Coefficients (MFCCs) and followed by vector quantization (VQ). The experimental results obtained on a spoken Arabic digit dataset proved...
This article presents the task of speaker identification in a closed group. It discusses main steps of the identification process ranging from the proper speech features to the classification methods and statistical signal processing. However, its main focus is on tuning the final system using KNN classification method by setting up the number of neighbors, and reducing the feature vector dimension...
Mel Frequency Cepstral Coefficients (MFCC) are widely used in speech recognition and speaker identification. MFCC features are usually pre-processed before being used for recognition. One of these pre-processing is creating delta and delta-delta coefficients and append them to MFCC to create feature vector. Another pre-processing is coefficients mean normalization. In this paper, the effect of these...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.