The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we present a high precision speaker independent vowel/non vowel classifier based on a simple feed forward MLP (Multi Layer Perceptron) and several rules. RASTA-PLP analysis of the speech signal resulting to mel-cepstral coefficients and a formant tracking method are used in order to provide the feature vectors for the MLP. To train and test the system we used a part of the TIMIT database...
The most successful approach to speech and speaker recognition is to treat the speech signal as a stochastic pattern and to use a statistical pattern recognition technique for matching utterances. This paper attempts to study the performance of Text dependent speaker verification system using Delta-Delta Mel Frequency Cepstral Coefficients (MFCC-Δ-Δ) feature vector and Fuzzy C means (FCM) speaker...
This paper motivates the use of combination of mel frequency cepstral coefficients (MFCC) and its delta derivatives (DMFCC and DDMFCC) calculated using mel spaced Gaussian filter banks for text independent speaker recognition. MFCC modeled on the human auditory system shows robustness against noise and session changes and hence has become synonymous with speaker recognition. Our main aim is to test...
Human voice can serve as a password/key for access to various services. This voice is used for verifying speaker in speaker verification system based on the features extracted from the voice signal. In automated speaker verification the speaker's voice signal is processed to extract speaker-specific information which is used to generate voiceprint also known as a template that cannot be replicated...
In this paper, in order to properly evaluate the relative importance of priors and observed data in the Bayesian framework, we propose an extended Gaussian mixture model (EGMM) and design the corresponding learning inference algorithms. First, we define the likelihood function of the EGMM and then propose the variational learning algorithm for this EGMM. Moreover, the proposed model and approach are...
Speaker recognition is a biometric identification method that uses different features of individual's voice for automatically identifying a speaker among a population. Two different features set for text dependent speaker recognition. A comparison is performed between Linear Predictive Coefficients (LPC) and Prosodic Features (F0, F1, F2, and F3) along with Radial Basis Function Network (RBFN) for...
This paper deals with the recognition process of Bangla speech. The used database consists of two sets of data - one is for training containing 3824 utterances of Bangla digit sequences of 25 male and 25 female speakers and the other one is test dataset containing 1985 utterances of 26 male and 26 female speakers. The test set is subdivided into four groups such as clean1, clean2, clean3 and clean4...
For spoken language processing applications like speaker recognition/verification, not only that the silence segments do not contribute any speaker specific information, but also it dilutes the already available information content in the speech segments in the audio data. It has been experimentally studied that removing silence segments with the help of a voice activity detector(VAD) from the utterance...
In text-independent speaker identification, there are a large number of likelihood computations, especial in large population. To speed up the recognition, we proposed a lightweight algorithm called CBF (Codebook Filtering). CBF provides two phase of speaker pruning to accelerate the speaker recognition. To make CBF could process large population, this paper implements CBF on Map-Reduce framework...
In speaker recognition tasks, one of the reasons for reduced accuracy is due to closely resembling speakers in the acoustic space. In order to increase the discriminative power of the classifier, the system must be able to use only the unique features of a given speaker with respect to his/her acoustically resembling speaker. This paper proposes a technique to reduce the confusion errors, by finding...
This paper introduces the use of two new features for speaker identification, Residual Phase Cepstrum Coefficients (RPCC) and Glottal Flow Cepstrum Coefficients (GLFCC), to capture speaker-specific characteristics from their vocal excitation patterns. Results on a cross-lingual speaker identification task taken from the NIST 2004 SRE demonstrate that these RPCC and GLFCC features are significantly...
Recently speaker recognition system became high interesting by researchers for both software and hardware solutions. Different technologies have been adopted to implement speaker recognition system that has performance with optimal time response with acceptable accuracy. Research progresses are going on to provide highly durable and precise recognition system that can be embedded into critical implementation...
The article presents the development of a speaker identification system as one part of the multimodal interface for the HBB-NEXT project. A short introduction to a speaker identification problem in the context of HBB-NEXT project is given. Then we focus on the design, optimization and method selection process in order to realize a real time, text independent speaker identification application, namely:...
Given single-channel recordings of simultaneous speakers, we may need to identify the individual speakers for separating their voices. In this paper, we consider the problem of identifying two simultaneous speakers based on single-channel data, i.e., speakerpair identification. We model the problem as identifying speakers using noisy speech with partial temporal corruption, which corresponds to the...
This paper proposes a new feature extraction method called multi-directional local feature (MDLF) to apply on an automatic speaker recognition system. To obtain MDLF, a linear regression is applied on FFT signal in four different directions which are horizontal (time axis), vertical (frequency axis), diagonal 45 degree (time-frequency) and diagonal 135 degree (time-frequency). In the experiments,...
The goal of this paper is to describe the voice command system as part of the multi modal user interface for residential application project demoed at CES 2012. The application is a 3D TV panel which can be controlled through face recognition, gesture, and speech. The speech interface is invoked using activation keyword, and terminated in similar fashion with de-activation keyword. Speaker recognition...
This paper presents reliability of MLP in speaker identification using characteristics extracted from their voices. Classification accuracy depends on speaking condition and varies up to 23% depending on the selected speaking condition. Results of simulation experiment show that MLP is effective in speaker identification, especially in the case of retelling and synchronous speech where we achieved...
SVM is a novel type of statistical learning method that has been successfully used in speaker recognition. However, training SVM consumes long computing time and large storage space with all training examples. This paper proposes an improved sparse least-squares support vector machine (LS-SVM) for speaker identification. Firstly KPCA is exploited to reduce the dimension of input vectors and to denoise...
This article presents an attempt to link the uploaders of videos based on the audio track of the videos. Using a subset of the MediaEval [10] Placing Task's Flickr video set, which is labeled with the uploader's name, we conducted an experiment with a similar setup as a typical NIST speaker recognition evaluation run. Based on the assumption that the audio might be matched in various ways (speaker,...
This paper introduces some simplifications to the i-vector speaker recognition systems. I-vector extraction as well as training of the i-vector extractor can be an expensive task both in terms of memory and speed. Under certain assumptions, the formulas for i-vector extraction—also used in i-vector extractor training—can be simplified and lead to a faster and memory more efficient code. The first...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.