The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The robustness of speech recognizers towards noise can be increased by normalizing the statistical moments of the Mel-frequency cepstral coefficients (MFCCs), e. g. by using cepstral mean normalization (CMN) or cepstral mean and variance normalization (CMVN). The necessary statistics are estimated over a long time window and often, a complete utterance is chosen. Consequently, changes in the background...
Spectral subtraction is one of the earliest and longest standing, popular approaches to noise compensation and speech enhancement. A literature search reveals an abundance of recent research papers that report the successful application of spectral subtraction to noise robust automatic speech recognition (ASR). However, as with many alternative approaches, the benefits lessen as noise levels in the...
Microblogging sites such as Twitter and Weibo are increasingly being used to enhance situational awareness during various natural and man-made disaster events such as floods, earthquakes, and bomb blasts. During any such event, thousands of microblogs (tweets) are posted in short intervals of time. Typically, only a small fraction of these tweets contribute to situational awareness, while the majority...
An electrolarynx is a device that artificially generates excitation sounds to produce electrolaryngeal (EL) speech. Although proficient laryngectomees can produce intelligible EL speech by using this device, it sounds quite unnatural due to the mechanical excitation. To address this issue, we have proposed several EL speech enhancement methods using statistical voice conversion and showed that statistical...
For poor robustness issues of pitch detection of noisy speech, the improved pitch detection method combined with speech enhancement is proposed in this paper. Firstly, in order to reduce background noise and receive the clean speech relatively, we use the multi-band spectral subtraction and the masking properties of human auditory system to work on the noisy speech, and next use the energy and zero-crossing...
Enhancement of speech distorted by reverberation is issue of the day. The problem has been actively studied in the last decade. However, it is still extremely difficult to find clear recommendations on choice of boundary value between early reflections and late reverberation, optimal in sense of such criteria as speech recognition accuracy and speech quality. Another problem is getting of simple pre-processor...
This paper analyzes the impact of various preprocessing modules to improve the performance of automatic speech recognition system (ASR) in noisy environment. After choosing the state-of-the-art algorithms designed in the signal domain and feature domain, their performances in various noise conditions are thoroughly evaluated. Since the enhancement has been directly made to the features that are actually...
We propose Cross-Channel Spectral Subtraction (CCSS), a source separation method for recognizing meeting speech where one microphone is prepared for each speaker. The method quickly adapts to changes in transfer functions and uses spectral subtraction to suppress the speech of other speakers. Compared with conventional source separation methods based on independent component analysis (ICA) or that...
We present a speech pre-processing scheme (SPPS) for robust speech recognition in the moving motorcycle environment. The SPPS is dynamically adapted during the run-time operation of the speech front-end, depending on short-time characteristics of the acoustic environment. In detail, the fast varying acoustic environment is modeled by GMM clusters based on which a selection function determines the...
In this study, we evaluate our proposed methods for enhancing alaryngeal speech based on statistical voice conversion techniques. Voice conversion based on a Gaussian mixture model has been applied to the conversion of alaryngeal speech into normal speech (AL-to-Speech). Moreover, one-to-many eigenvoice conversion (EVC) has also been applied to AL-to-Speech to enable the recovery of the original voice...
Many people have great difficulty in Understanding speech with background noise. Speech Enhancement plays a vital role in such situations. The background noise has to be removed from the noisy speech signal to increase the signal intelligibility and to reduce the listener fatigue. In this paper, a novel approach is used to enhance the perceived quality of the speech signal when the additive noise...
Burst onset landmarks in the speech signal are transient segments with low energy and their accurate detection is important in applications involving landmark based speech modification, estimation of place of closure for speech training aids, and phoneme recognition. Rate of change measures of energy parameters from spectral bands with fixed boundaries are generally used for landmark detection. The...
Speech segmentation to covariance-stationary regions is of interest, for example in subspace-based speech enhancement. However as the true covariance matrices of speech segments are unknown, it is usual to use their sample estimates. To check whether two sample covariance matrices have been drawn from the same distribution or not, we have used a test statistic previously proposed for image segmentation...
This paper presents a new approach to enhancing noisy (white Gaussian noise) speech signals for robust speech recognition. It is based on the minimization of an estimate of denoising MSE (known as SURE) and does not require any hypotheses on the original signal. The enhanced signal is obtained by thresholding coefficients in the DCT domain, with the parameters in the thresholding functions being specified...
Automatic recognition of emotional states via speech signal has attracted increasing attention in recent years. A number of techniques have been proposed which are capable of providing reasonably high accuracy for controlled studio settings. However, their performance is considerably degraded when the speech signal is contaminated by noise. In this paper, we present a framework with adaptive noise...
To improve the robustness of automatic speech recognition (ASR) system in adverse environments, speech enhancement preprocessor has been widely used recently to reduce the impact of noise. In this paper, an improved noise estimation approach is proposed in the enhancement preprocessor to keep enhancement performance with low distortion and complexity. First, the noisy speech is transformed into Bark...
In this paper, an effective compensation scheme for the spectra of speech signals is proposed in order to improve their noise robustness. In this compensation scheme, named magnitude spectrum enhancement (MSE), a voice activity detection (VAD) process is first processed for the frame sequence of the utterance, and then the magnitude spectra of non-speech frames are set to be small while those of speech...
This paper proposes a new statistical model-based likelihood ratio test (LRT) VAD to obtain reliable speech / non-speech decisions. In the proposed method, the likelihood ratio (LR) is calculated differently for voiced frames, as opposed to unvoiced frames: only DFT bins containing harmonic spectral peaks are selected for LR computation. To evaluate the new VAD's effectiveness in improving the noise-robustness...
In this paper, we suggest a noise robust isolated word speech recognition system which can be applied in various noise environments. In this method, Kalman filter is used to remove the background noise and to enhance the speech signal. The enhanced signal is integrated into the front end of Dynamic Time Warping (DTW) isolated word recognition in order to guarantee high performance and robust recognition...
In this paper, one fuzzy retrieval algorithm is designed to work with LVCSR in the speech navigation system. Inverted indexing as well as other searching skills are utilized to speed up the searching while keeping the performance. Several cell levels are tried instead of just using word. Easily reaching 90% sentence accuracy within normal database, this framework can also handle very large database,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.