The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In low resource Automatic Speech Recognition (ASR), one usually resorts to the Statistical Machine Translation (SMT) technique to learn transform rules to refine grapheme lexicon. To do this, we face two challenges. One is to generate grapheme sequences from the training data as the targets, which is paired with the original transcripts to train SMT models; the other is to effectively prune the learned...
Speech recognition systems generally use delta and delta-delta (velocity and acceleration) coefficients to characterise the dynamics apparent in frame-based representations of speech. These coefficients can be thought of as the errors of simple predictors. This paper describes the use of error coefficients derived from more advanced (and accurate) forms of prediction and interpolation. Both overall...
Speaker identification (SID) in cochannel speech, where two speakers are talking simultaneously over a single recording channel, is a challenging problem. Previous studies address this problem in the anechoic environment under the Gaussian mixture model (GMM) framework. On the other hand, cochannel SID in reverberant conditions has not been addressed. This paper studies cochannel SID in both anechoic...
A group of junior and senior researchers gathered as a part of the 2014 Frederick Jelinek Memorial Workshop in Prague to address the problem of predicting the accuracy of a nonlinear Deep Neural Network probability estimator for unknown data in a different application domain from the domain in which the estimator was trained. The paper describes the problem and summarizes approaches that were taken...
Recurrent neural networks (RNNs) have recently been applied as the classifiers for sequential labeling problems. In this paper, deep bidirectional RNNs (DBRNNs) are applied for the first time to error detection in automatic speech recognition (ASR), which is a sequential labeling problem. We investigate three types of ASR error detection tasks, i.e. confidence estimation, out-of-vocabulary word detection...
The presence of Lombard Effect in speech is proven to have severe effects on the performance of speech systems, especially speaker recognition. Varying kinds of Lombard speech are produced by speakers under influence of varying noise types [1]. This study proposes a high-accuracy classifier using deep neural networks for detecting various kinds of Lombard speech against neutral speech, independent...
Recent studies have demonstrated the potential of unsupervised feature learning for sound classification. In this paper we further explore the application of the spherical k-means algorithm for feature learning from audio signals, here in the domain of urban sound classification. Spherical k-means is a relatively simple technique that has recently been shown to be competitive with other more complex...
Speech recognition systems are either based on parametric approach or non-parametric approach. Parametric based systems such as HMMs have been the dominant technology for speech recognition in the past decade. Despite a lot of advancements and enhancements in the design of these systems: key problems such as long term temporal dependence, etc. Has not yet been solved. Recently due to availability...
This paper proposes a robust template based on the previously proposed ECWRT (enhanced cross word reference template) for template-based ASR, by using correlational weight adjusting method to improve robustness against elderly speech variation named CWCWRT. This work addresses two vital issues: such as outlier rejection in training set and elimination of unwanted utterances which usually happen by...
A recently proposed concept for training reverberation-robust acoustic models for automatic speech recognition using pairs of clean and reverberant data is extended from word models to tied-state triphone models in this paper. The key idea of the concept, termed ICEWIND, is to use the clean data for the temporal alignment and the reverberant data for the estimation of the emission densities. Experiments...
Many application areas that use supervised machine learning make use of multiple raters to collect target ratings for training data. Usage of multiple raters, however, inevitably introduces the risk that a proportion of them will be unreliable. The presence of unreliable raters can prolong the rating process, make it more expensive and lead to inaccurate ratings. The dominant, "static" approach...
In this article a text-independent speaker verification problem is considered. After the feature extraction, each conversation side has been represented as a vector in a fixed dimensional space. In order to reduce an influence of the lengths of utterances and also the channel properties, various vector normalization techniques have been selected from the literature, modified, and tested. Additionally,...
In this paper, we present a method to extract probabilistic acoustic features by using the Adaptive Boosting algorithm (AdaBoost). We build phoneme Gaussian mixture classifiers, and use AdaBoost to enhance the classification performance. The outputs from AdaBoost are the posterior probabilities for each frame given all phonemes. Those posterior features are then used to train a new acoustic model...
Setting out from the point of view that automatic speech recognition (ASR) ought to benefit from data in languages other than the target language, we propose a novel Kullback-Leibler (KL) divergence based method that is able to exploit multilingual information in the form of universal phoneme posterior probabilities conditioned on the acoustics. We formulate a means to train a recognizer on several...
In this paper, we propose an error correction interface for a voice word processor. This correction interface includes user intention understanding and automatic error region detection. For accurate correction, we include a confirmation process that includes an error region control command and a re-uttering command. We evaluate the performance of the user intention understanding first, and we evaluate...
This paper shows that pattern classification based on machine learning is a powerful tool for analyzing human brain activity data obtained by magnetoencephalography (MEG). In our previous work, a weighting method using multiple kernel learning was proposed, but this method had a high computational cost. In this paper, we propose a novel and fast weighting method using an AdaBoost algorithm to find...
This paper considers a learning framework for speech emotion classification using a discriminant function based on Gaussian mixture models (GMMs). The GMM parameter set is estimated by margin scaling with a loss function to reduce the risk of predicting emotions with high loss. Here, the loss function is defined as a function of a distance metric using the Watson and Tellegen's emotion model. Margin...
We propose a recognition method based on statistics through analysis the grammatical and semantic characteristics of the Chinese organization name. This recognition method includes three elements: frequency, part of speech, word length. We use the data in mature collection as training data; separately calculate a candidate organization name's word frequency, part of speech and word length of the contribution...
In many criminal cases, evidence might be in the form of telephone conversations or tape recordings. Therefore, law enforcement agencies have been concerned about accurate methods to profile different characteristics of a speaker from recorded voice patterns, which facilitate the identification of a criminal. This paper proposes a new approach for speaker gender detection and age estimation, based...
In Myanmar to English language translation system, in order to provide meaningful sentence from one language to another is non-trivial task. POS tagging is used as an early stage of linguistic text analysis in many applications. POS tagging is a process of assigning correct syntactic categories to each word. Tagsets and word disambiguation rules are fundamental parts of any POS tagger. This paper...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.