The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In low resource Automatic Speech Recognition (ASR), one usually resorts to the Statistical Machine Translation (SMT) technique to learn transform rules to refine grapheme lexicon. To do this, we face two challenges. One is to generate grapheme sequences from the training data as the targets, which is paired with the original transcripts to train SMT models; the other is to effectively prune the learned...
The robustness of speech recognizers towards noise can be increased by normalizing the statistical moments of the Mel-frequency cepstral coefficients (MFCCs), e. g. by using cepstral mean normalization (CMN) or cepstral mean and variance normalization (CMVN). The necessary statistics are estimated over a long time window and often, a complete utterance is chosen. Consequently, changes in the background...
Recurrent neural networks (RNNs) have recently been applied as the classifiers for sequential labeling problems. In this paper, deep bidirectional RNNs (DBRNNs) are applied for the first time to error detection in automatic speech recognition (ASR), which is a sequential labeling problem. We investigate three types of ASR error detection tasks, i.e. confidence estimation, out-of-vocabulary word detection...
The goal of this article is to analyse how the length of utterances affects performance of an automatic speech recognizer (ASR). Benchmarks of an ASR system were performed for utterances of various lengths on English and Czech corpora. Then the observed phenomena are tried to be explained theoretically. Eventually, results are summarized and some conclusions drawn.
Voice activity detection (VAD) plays a crucial role in speech processing, especially in automatic speech recognition (ASR). It identifies the boundaries of the speech to be recognized and the boundary accuracies may significantly affect the recognition performance. Conventional VAD evaluation criteria are mostly based on frame-level accuracy of speech/non-speech classification, which may result in...
The performance of Automatic Speech recognition system (ASR) built using close talk microphones degrades in noisy environments. AS R built using Throat Microphone (TM) speech shows relatively better performance under such adverse situations. However, some of the sounds are not well captured in TM. In this work we explore the combined use of Normal Microphone (NM) and TM features to improve the recognition...
Interactive Voice Response (IVR) technology makes customer service 24 × 7 and cost effective and has been used by most customer facing enterprises. While effective, IVRs requires inputs to be keyed in using a touch tone phone, constraining the type of information that can be input. For this reason customers in general choose to speak with a human agent directly, negating the cost effectiveness of...
The speech of cleft palate (CP) patients has typical characteristics. Hypernasality and low speech intelligibility are the primary characteristics for CP speech. In this work, an automatic evaluation of different levels of hypernasality and speech intelligibility algorithm for CP speech was proposed, in order to provide an objective tool for speech therapist. To identify different levels of hypernasality,...
This paper presents the development of a speech recognition system for automatically recognizing fluently spoken digit strings in Northern Sotho. The digit strings can be isolated or connected/continuous with known or unknown length. The digit recognition system has been trained with the aim of satisfying its potential end-users. Our main research focus was to enhance the robustness of a connected-digits...
A new form of augmentative and alternative communication (AAC) device for people with severe speech impairment—the voice-input voice-output communication aid (VIVOCA)—is described. The VIVOCA recognizes the disordered speech of the user and builds messages, which are converted into synthetic speech. System development was carried out employing user-centered design and development methods, which identified...
Much of the efficiency of any Automatic Speech Recognition (ASR) system depends on its speech corpus. This is even more so for recognizers designed for specific tasks. Naturally, an ASR for spelling recognition performs better if it is trained with a spelling speech corpus rather than a generic one. Although several speech corpora are available in Thai, we are still lack of Thai spelling speech corpora...
In this paper, we study the effect of the design parameters of a single-channel reverberation suppression algorithm on reverberation-robust speech recognition. At the same time, reverberation compensation at the speech recognizer is investigated. The analysis reveals that it is highly beneficial to attenuate only the reverberation tail after approximately 50 ms while coping with the early reflections...
In this work, we propose a novel method for vocabulary selection to automatically adapt automatic speech recognition systems to the diverse topics that occur in educational and scientific lectures. Utilizing materials that are available before the lecture begins, such as lecture slides, our proposed framework iteratively searches for related documents on the web and generates a lecture-specific vocabulary...
Hidden factor such as gender characteristic plays an important role on the performance of Bangla (widely used as Bengali) automatic speech recognition (ASR). If there is a suppression process that represses the decrease of differences in acoustic-likelihood among categories resulted from gender factors, a robust ASR system can be realized. In our previous paper, we proposed a technique of gender effects...
This paper presents a hybridization of Multilayer Neural Network-based Bangla phoneme recognition method for Automatic Speech Recognition (ASR) incorporating dynamic parameters. The method consists of four stages: at first stage, a multilayer neural network (MLN) converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities. Phoneme probabilities from the first...
Speaker-specific characteristics play an important role on the performance of Bangla (widely used as Bengali) automatic speech recognition (ASR). It is difficult to recognize speech affected by gender factors, especially when an ASR system contains only a single acoustic model. If there exists any suppression process that represses the decrease of differences in acoustic-likelihood among categories...
This paper presents a Neural Network-based Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of three stages: at first stage, a multilayer neural network (MLN) converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities, where the second stage computes velocity (?) coefficients from the phoneme probabilities by using...
Building an automatic speech recognizer (ASR) means that one has to provide the acoustic model, language model and lexicon for the intended language, which is also applied for Indonesian ASR. Unfortunately, providing acoustic model for a certain language is quite expensive, unlike the language model and the lexicon. This is because one has to record many utterances from several speakers to build a...
In this paper, we propose a novel framework to integrate articulatory features (AFs) into HMM- based ASR system. This is achieved by using posterior probabilities of different AFs (estimated by multilayer perceptrons) directly as observation features in Kullback-Leibler divergence based HMM (KL-HMM) system. On the TIMIT phoneme recognition task, the proposed framework yields a phoneme recognition...
This paper describes several Sound-Packet segmentation techniques, which will facilitate Automatic Speech Recognition (ASR) for Bangla speech signal. The approximate duration of a sound-packet has been determined and an envelope-detection method has been presented to determine the end-points of sound-packets. The 1st difference method, based on moving average of 1st difference of the signal, is then...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.