The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Human listeners are capable of recognizing speech in noisy environment, while most of the traditional speech recognition methods do not perform well in the presence of noise. Unlike traditional Mel-frequency cepstral coefficient (MFCC)-based method, this study proposes a phoneme classification technique using the neural responses of a physiologically-based computational model of the auditory periphery...
This paper investigates the contribution of frequency bands for automatic voice pathology detection. First, the input voice signal is passed through a number of time-domain band-pass filters. The center frequencies are spaced on an octave scale. Each filter output is then divided into overlapping frames. Auto-correlation function is applied to each block to find the first largest peak, in areas other...
The present work explores the significance of the consonant-vowel (CV) transition and steady vowel (SV) regions for language identification (LID) task. The language-specific vocal tract information represented by Mel-frequency cepstral coefficients (MFCCs), extracted from the CV transition and steady vowel regions for LID task. The duration of CV transition and steady vowel regions are varied to analyze...
A Philippine LID system has not been previously created because of the limited amount of recorded speech data. This research initiates the LID research using the Philippine Language Database (PLD) collected by the Digital Signal Processing Laboratory of the University of the Philippines Diliman (DSP-UPD). Mel Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP), Shifted Delta...
The article presents the development of a speaker identification system as one part of the multimodal interface for the HBB-NEXT project. A short introduction to a speaker identification problem in the context of HBB-NEXT project is given. Then we focus on the design, optimization and method selection process in order to realize a real time, text independent speaker identification application, namely:...
India is a vast country with a large number of languages. Among these some languages descend from a single mother language giving rise to a language family. The major official languages in India fall under two language families namely Indo-European and Dravidian. In this paper, we have discussed about a system which takes speech file as input and identifies the language family to which it belongs...
In certain situations, speech might be shifted in the frequency domain amid the presence of noise. To be able to compensate for the spectral shift, it is important to know the amount of frequency shift present. A method based on Mel-frequency-cepstral-coefficient (MFCC) and Gaussian Mixture model (GMM) super vector is proposed for detecting frequency shifts in speech. MFCC or LFCC is extracted to...
Gaussian Mixture Models (GMMs) have been proven effective in modeling speech and other acoustic signals. In this study, we have used GMMs to model different noise sources, viz. subway, babble, car and exhibition. Expectation maximization algorithm has been implemented to fit the model. Further, we present the ‘threshold’ method which uses the energy coefficient of the Mel - Frequency Cepstral Coefficients...
This paper proposes a new feature extraction method called multi-directional local feature (MDLF) to apply on an automatic speaker recognition system. To obtain MDLF, a linear regression is applied on FFT signal in four different directions which are horizontal (time axis), vertical (frequency axis), diagonal 45 degree (time-frequency) and diagonal 135 degree (time-frequency). In the experiments,...
Ambulatory devices can be used to detect heart diseases and save lives in critical time. These devices are based on sound classification that usually adopts a suitable data mining algorithm. This paper investigates the performance of Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) classifiers in classifying sound samples. SVM classifier makes use of a linearly separable hyperplane to...
This paper compares the feature sets extracted using time-frequency analysis approach and frequency-time analysis approach for text-independent speaker identification. Mel-frequency cepstral coefficient (MFCC) feature set and Inverted Mel-frequency cepstral coefficient (IMFCC) feature set are extracted using time-frequency analysis approach. Temporal energy subband cepstral coefficient (TESBCC) feature...
Gaussian Mixture Model (GMM) is a widely used, simple and effective modeling approach for spoken language identification. Traditionally EM algorithm is used to train this model. In this paper we propose a new method named WA-GMM (Weight Adapted GMM) for estimating the weights of GMM Gaussian components using bag-of-unigram and Support Vector Machine (SVM): SVM weights which are trained on bag-of-unigram...
In this paper a hierarchical structure is proposed for automatic gender identification (AGI). In this structure two clustering techniques are used. The first technique is divisive clustering for dividing speakers from each gender to some classes of speakers. The second clustering technique is agglomerative clustering for creating a hierarchical structure. Feature reduction is done by SOAP feature...
This paper introduces the use of a new method of feature extraction based on frequency-time analysis approach for text-independent speaker identification. The impetus for this new feature extraction technique comes from the filter bank summation method of STFT using Nyquist filter bank. The focus of this work is on applications which yield higher identification accuracy without increasing the computational...
Soccer highlight detection is an active research topic in recent years. In this paper, we present our effort to detect an important audio keyword - excited commentator speech, which contributes to a state-of-the-art soccer highlight extraction system. We propose an approach of using statistical classifier based on Gaussian mixture models (GMMs) with unsupervised model adaptation. The excited speech...
This paper discusses tone pronunciation scoring for Mandarin multi-syllable words in Computer Assisted Language Learning (CALL) System. A commonly used tone evaluation method is using GMM to model various pitch sequence. Because the pattern of pitch sequence will change a lot in the multisyllable context, tone models trained on mono-tone database will not have good performance on multi-syllable speech...
Different kinds of features in time domain, spectral domain and cepstral domain are used for musical genre classification. In this paper, through the fusion of short-term timbral features and long-term rhythmic feature, we propose a novel method where: musical genre vector is constructed using the likelihood ratio of GMM (Gaussian Mixture Model) and radar chart is applied to provide visualized style...
In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is recorded by one microphone only. The goal is to identify both of the speakers from their mixed signal. High recognition accuracies have already been reported when an accurately estimated signal-to-signal ratio (SSR) is available. In this paper, we approach the problem without estimating...
This paper introduces the use of a new method of feature extraction for robust text-independent speaker identification. The focus of this work is on applications which yield higher identification accuracy without increasing the computational effort. The impetus for this new feature extraction technique comes from a new transformation which is based on the Nyquist filter bank. We have proposed this...
In this paper, an adaptive speaker identification method combined with the human behavioral trait based on Gaussian mixture model (GMM) is constructed. The method can automatically select different length of speech for different speakers in identification process according to the feedback probability estimation, so it can guarantee identification accuracy without reducing, and to reduce the identification...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.