Advanced search

From:

To:

Items from 1 to 20 out of 54 results

chapter

On statistical machine translation method for lexicon refinement in speech recognition

Haihua Xu, Xiong Xiao, Eng-Siong Chng, Haizhou Li

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP) > 25 - 29

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP)

In low resource Automatic Speech Recognition (ASR), one usually resorts to the Statistical Machine Translation (SMT) technique to learn transform rules to refine grapheme lexicon. To do this, we face two challenges. One is to generate grapheme sequences from the training data as the targets, which is paired with the original transcripts to train SMT models; the other is to effectively prune the learned...

chapter

Cepstral noise subtraction for robust automatic speech recognition

Robert Rehr, Timo Gerkmann

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 375 - 378

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The robustness of speech recognizers towards noise can be increased by normalizing the statistical moments of the Mel-frequency cepstral coefficients (MFCCs), e. g. by using cepstral mean normalization (CMN) or cepstral mean and variance normalization (CMVN). The necessary statistics are estimated over a long time window and often, a complete utterance is chosen. Consequently, changes in the background...

chapter

ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks

Atsunori Ogawa, Takaaki Hori

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4370 - 4374

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recurrent neural networks (RNNs) have recently been applied as the classifiers for sequential labeling problems. In this paper, deep bidirectional RNNs (DBRNNs) are applied for the first time to error detection in automatic speech recognition (ASR), which is a sequential labeling problem. We investigate three types of ASR error detection tasks, i.e. confidence estimation, out-of-vocabulary word detection...

chapter

On the impact of sentence length on recognition accuracy

Tomas Valenta, Lubos Smidl

2014 12th International Conference on Signal Processing (ICSP) > 500 - 504

2014 12th International Conference on Signal Processing (ICSP 2014)

The goal of this article is to analyse how the length of utterances affects performance of an automatic speech recognizer (ASR). Benchmarks of an ASR system were performed for utterances of various lengths on English and Czech corpora. Then the observed phenomena are tried to be explained theoretically. Eventually, results are summarized and some conclusions drawn.

chapter

Evaluating vad for automatic speech recognition

Sibo Tong, Nanxin Chen, Yanmin Qian, Kai Yu

2014 12th International Conference on Signal Processing (ICSP) > 2308 - 2314

2014 12th International Conference on Signal Processing (ICSP 2014)

Voice activity detection (VAD) plays a crucial role in speech processing, especially in automatic speech recognition (ASR). It identifies the boundaries of the speech to be recognized and the boundary accuracies may significantly affect the recognition performance. Conventional VAD evaluation criteria are mostly based on frame-level accuracy of speech/non-speech classification, which may result in...

chapter

Improving recognition of syallabic units of Hindi languagae using combined features of Throat Microphone and Normal Microphone speech

N. Radha, A. Shahina, G. Vinoth, A. Nayeemulla Khan

2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT) > 1343 - 1348

2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT)

The performance of Automatic Speech recognition system (ASR) built using close talk microphones degrades in noisy environments. AS R built using Throat Microphone (TM) speech shows relatively better performance under such adverse situations. However, some of the sounds are not well captured in TM. In this work we explore the combined use of Normal Microphone (NM) and TM features to improve the recognition...

chapter

Deploying usable speech enabled IVR systems for mass use

Chitralekha Bhat, B S Mithun, Vikram Saxena, Vrushali Kulkarni, more

2013 International Conference on Human Computer Interactions (ICHCI) > 1 - 5

2013 International Conference on Human Computer Interactions (ICHCI)

Interactive Voice Response (IVR) technology makes customer service 24 × 7 and cost effective and has been used by most customer facing enterprises. While effective, IVRs requires inputs to be keyed in using a touch tone phone, constraining the type of information that can be input. For this reason customers in general choose to speak with a human agent directly, negating the cost effectiveness of...

chapter

Automatic evaluation of hypernasality and speech intelligibility for children with cleft palate

Ling He, Jing Zhang, Qi Liu, Heng Yin, more

2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA) > 220 - 223

2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA)

The speech of cleft palate (CP) patients has typical characteristics. Hypernasality and low speech intelligibility are the primary characteristics for CP speech. In this work, an automatic evaluation of different levels of hypernasality and speech intelligibility algorithm for CP speech was proposed, in order to provide an objective tool for speech therapist. To identify different levels of hypernasality,...

chapter

Connected-digits recognition for an under-resourced language using Hidden Markov Models

Mabu Johannes Manaileng, Madimetja Jonas Manamela

Proceedings ELMAR-2013 > 211 - 214

2013 55th International Symposium ELMAR

This paper presents the development of a speech recognition system for automatically recognizing fluently spoken digit strings in Northern Sotho. The digit strings can be isolated or connected/continuous with known or unknown length. The digit recognition system has been trained with the aim of satisfying its potential end-users. Our main research focus was to enhance the robustness of a connected-digits...

article

A Voice-Input Voice-Output Communication Aid for People With Severe Speech Impairment

Mark S. Hawley, Stuart P. Cunningham, Phil D. Green, Pam Enderby, more

IEEE Transactions on Neural Systems and Rehabilitation Engineering > 2013 > 21 > 1 > 23 - 31

A new form of augmentative and alternative communication (AAC) device for people with severe speech impairment—the voice-input voice-output communication aid (VIVOCA)—is described. The VIVOCA recognizes the disordered speech of the user and builds messages, which are converted into synthetic speech. System development was carried out employing user-centered design and development methods, which identified...

chapter

The CU-MFEC corpus for Thai and english spelling speech recognition

Natthawut Kertkeidkachorn, Supadaech Chanjaradwichai, Teera Suri, Krerksak Likitsupin, more

2012 International Conference on Speech Database and Assessments > 18 - 23

2012 Oriental COCOSDA 2012 - International Conference on Speech Database and Assessments

Much of the efficiency of any Automatic Speech Recognition (ASR) system depends on its speech corpus. This is even more so for recognizers designed for specific tasks. Naturally, an ASR for spelling recognition performs better if it is trained with a spelling speech corpus rather than a generic one. Although several speech corpora are available in Thai, we are still lack of Thai spelling speech corpora...

chapter

On the application of reverberation suppression to robust speech recognition

Roland Maas, Emanuel A.P. Habets, Armin Sehr, Walter Kellermann

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 297 - 300

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

In this paper, we study the effect of the design parameters of a single-channel reverberation suppression algorithm on reverberation-robust speech recognition. At the same time, reverberation compensation at the speech recognizer is investigated. The analysis reveals that it is highly beneficial to attenuate only the reverberation tail after approximately 50 ms while coping with the early reflections...

chapter

Unsupervised vocabulary selection for real-time speech recognition of lectures

Paul Maergner, Alex Waibel, Ian Lane

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4417 - 4420

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

In this work, we propose a novel method for vocabulary selection to automatically adapt automatic speech recognition systems to the diverse topics that occur in educational and scientific lectures. Utilizing materials that are available before the lecture begins, such as lecture slides, our proposed framework iteratively searches for related documents on the web and generates a lecture-specific vocabulary...

chapter

Bangla ASR design by suppressing gender factor with gender-independent and gender-based HMM classifiers

Foyzul Hassan, Mohammed Rokibul Alam Kotwal, Mohammad Nurul Huda

2011 World Congress on Information and Communication Technologies > 1276 - 1281

2011 World Congress on Information and Communication Technologies (WICT)

Hidden factor such as gender characteristic plays an important role on the performance of Bangla (widely used as Bengali) automatic speech recognition (ASR). If there is a suppression process that represses the decrease of differences in acoustic-likelihood among categories resulted from gender factors, a robust ASR system can be realized. In our previous paper, we proposed a technique of gender effects...

chapter

Hybridization of two stage Multilayer Neural Networks based Bangla ASR incorporating dynamic parameters

Mohammed Rokibul Alam Kotwal, Md. Abdur Razzaque, Arif Hossen, Mohammad Nurul Huda

2011 11th International Conference on Hybrid Intelligent Systems (HIS) > 167 - 172

2011 11th International Conference on Hybrid Intelligent Systems (HIS 2011)

This paper presents a hybridization of Multilayer Neural Network-based Bangla phoneme recognition method for Automatic Speech Recognition (ASR) incorporating dynamic parameters. The method consists of four stages: at first stage, a multilayer neural network (MLN) converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities. Phoneme probabilities from the first...

chapter

Gender Effects Suppression in Bangla ASR by Designing Multiple HMM-Based Classifiers

Mohammed Rokibul Alam Kotwal, Foyzul Hassan, Md. Shafiul Alam, Shakib Ibn Daud, more

2011 International Conference on Computational Intelligence and Communication Networks > 390 - 394

2011 International Conference on Computational Intelligence and Communication Networks (CICN)

Speaker-specific characteristics play an important role on the performance of Bangla (widely used as Bengali) automatic speech recognition (ASR). It is difficult to recognize speech affected by gender factors, especially when an ASR system contains only a single acoustic model. If there exists any suppression process that represses the decrease of differences in acoustic-likelihood among categories...

chapter

Hybrid Features for Neural Network-Based Bangla ASR Incorporrating Velocity Coefficients (?)

Mohammed Rokibul Alam Kotwal, Foyzul Hassan, Shakib Ibn Daud, Md. Shafiul Alam, more

2011 International Conference on Computational Intelligence and Communication Networks > 416 - 420

2011 International Conference on Computational Intelligence and Communication Networks (CICN)

This paper presents a Neural Network-based Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of three stages: at first stage, a multilayer neural network (MLN) converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities, where the second stage computes velocity (?) coefficients from the phoneme probabilities by using...

chapter

Indonesian automatic speech recognition system using English-based acoustic model

Veri Ferdiansyah, Ayu Purwarianti

Proceedings of the 2011 International Conference on Electrical Engineering and Informatics > 1 - 4

2011 International Conference on Electrical Engineering and Informatics (ICEEI)

Building an automatic speech recognizer (ASR) means that one has to provide the acoustic model, language model and lexicon for the intended language, which is also applied for Indonesian ASR. Unfortunately, providing acoustic model for a certain language is quite expensive, unlike the language model and the lexicon. This is because one has to record many utterances from several speakers to build a...

chapter

Integrating articulatory features using Kullback-Leibler divergence based acoustic model for phoneme recognition

Ramya Rasipuram, Magimai.-Doss Mathew

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5192 - 5195

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we propose a novel framework to integrate articulatory features (AFs) into HMM- based ASR system. This is achieved by using posterior probabilities of different AFs (estimated by multilayer perceptrons) directly as observation features in Kullback-Leibler divergence based HMM (KL-HMM) system. On the TIMIT phoneme recognition task, the proposed framework yields a phoneme recognition...

chapter

A novel segmentation method of Sound-Packets for Bangla speech signal

M A N R Rahaman, A Das, M Z Nayen, M S Rahman

International Conference on Electrical&Computer Engineering (ICECE 2010) > 510 - 513

2010 6th International Conference on Electrical & Computer Engineering (ICECE 2010)

This paper describes several Sound-Packet segmentation techniques, which will facilitate Automatic Speech Recognition (ASR) for Bangla speech signal. The approximate duration of a sound-packet has been determined and an envelope-detection method has been presented to determine the end-points of sound-packets. The 1^st difference method, based on moving average of 1^st difference of the signal, is then...

Keywords:
ACCURACY
SPEECH
AUTOMATIC SPEECH RECOGNITION

Publication date

Set your own date range

INFONA - science communication portal

Advanced search

Advanced search

On statistical machine translation method for lexicon refinement in speech recognition

Cepstral noise subtraction for robust automatic speech recognition

ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks

On the impact of sentence length on recognition accuracy

Evaluating vad for automatic speech recognition

Improving recognition of syallabic units of Hindi languagae using combined features of Throat Microphone and Normal Microphone speech

Deploying usable speech enabled IVR systems for mass use

Automatic evaluation of hypernasality and speech intelligibility for children with cleft palate

Connected-digits recognition for an under-resourced language using Hidden Markov Models

A Voice-Input Voice-Output Communication Aid for People With Severe Speech Impairment

The CU-MFEC corpus for Thai and english spelling speech recognition

On the application of reverberation suppression to robust speech recognition

Unsupervised vocabulary selection for real-time speech recognition of lectures

Bangla ASR design by suppressing gender factor with gender-independent and gender-based HMM classifiers

Hybridization of two stage Multilayer Neural Networks based Bangla ASR incorporating dynamic parameters

Gender Effects Suppression in Bangla ASR by Designing Multiple HMM-Based Classifiers

Hybrid Features for Neural Network-Based Bangla ASR Incorporrating Velocity Coefficients (?)

Indonesian automatic speech recognition system using English-based acoustic model

Integrating articulatory features using Kullback-Leibler divergence based acoustic model for phoneme recognition

A novel segmentation method of Sound-Packets for Bangla speech signal

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options