Advanced search

From:

To:

Items from 1 to 20 out of 38 results

chapter

On statistical machine translation method for lexicon refinement in speech recognition

Haihua Xu, Xiong Xiao, Eng-Siong Chng, Haizhou Li

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP) > 25 - 29

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP)

In low resource Automatic Speech Recognition (ASR), one usually resorts to the Statistical Machine Translation (SMT) technique to learn transform rules to refine grapheme lexicon. To do this, we face two challenges. One is to generate grapheme sequences from the training data as the targets, which is paired with the original transcripts to train SMT models; the other is to effectively prune the learned...

chapter

Recognition of phonemes from estimation errors

L Baghai-Ravary, S W Beet

1996 8th European Signal Processing Conference (EUSIPCO 1996) > 1 - 4

1996 8th European Signal Processing Conference (EUSIPCO 1996)

Speech recognition systems generally use delta and delta-delta (velocity and acceleration) coefficients to characterise the dynamics apparent in frame-based representations of speech. These coefficients can be thought of as the errors of simple predictors. This paper describes the use of error coefficients derived from more advanced (and accurate) forms of prediction and interpolation. Both overall...

chapter

Deep neural networks for cochannel speaker identification

Xiaojia Zhao, Yuxuan Wang, DeLiang Wang

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4824 - 4828

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Speaker identification (SID) in cochannel speech, where two speakers are talking simultaneously over a single recording channel, is a challenging problem. Previous studies address this problem in the anechoic environment under the Gaussian mixture model (GMM) framework. On the other hand, cochannel SID in reverberant conditions has not been addressed. This paper studies cochannel SID in both anechoic...

chapter

Towards machines that know when they do not know: Summary of work done at 2014 Frederick Jelinek Memorial Workshop

Hynek Hermansky, Lukas Burget, Jordan Cohen, Emmanuel Dupoux, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5009 - 5013

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A group of junior and senior researchers gathered as a part of the 2014 Frederick Jelinek Memorial Workshop in Prague to address the problem of predicting the accuracy of a nonlinear Deep Neural Network probability estimator for unknown data in a different application domain from the domain in which the estimator was trained. The paper describes the problem and summarizes approaches that were taken...

chapter

ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks

Atsunori Ogawa, Takaaki Hori

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4370 - 4374

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recurrent neural networks (RNNs) have recently been applied as the classifiers for sequential labeling problems. In this paper, deep bidirectional RNNs (DBRNNs) are applied for the first time to error detection in automatic speech recognition (ASR), which is a sequential labeling problem. We investigate three types of ASR error detection tasks, i.e. confidence estimation, out-of-vocabulary word detection...

chapter

Weighted training for speech under Lombard Effect for speaker recognition

Muhammad Muneeb Saleem, Gang Liu, John H.L. Hansen

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4350 - 4354

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The presence of Lombard Effect in speech is proven to have severe effects on the performance of speech systems, especially speaker recognition. Varying kinds of Lombard speech are produced by speakers under influence of varying noise types [1]. This study proposes a high-accuracy classifier using deep neural networks for detecting various kinds of Lombard speech against neutral speech, independent...

chapter

Unsupervised feature learning for urban sound classification

Justin Salamon, Juan Pablo Bello

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 171 - 175

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recent studies have demonstrated the potential of unsupervised feature learning for sound classification. In this paper we further explore the application of the spherical k-means algorithm for feature learning from audio signals, here in the domain of urban sound classification. Spherical k-means is a relatively simple technique that has recently been shown to be competitive with other more complex...

chapter

Using k-Nearest Neighbor and Speaker Ranking for Phoneme Prediction

Muhammad Rizwan, David V. Anderson

2014 13th International Conference on Machine Learning and Applications > 383 - 387

2014 13th International Conference on Machine Learning and Applications (ICMLA)

Speech recognition systems are either based on parametric approach or non-parametric approach. Parametric based systems such as HMMs have been the dominant technology for speech recognition in the past decade. Despite a lot of advancements and enhancements in the design of these systems: key problems such as long term temporal dependence, etc. Has not yet been solved. Recently due to availability...

chapter

A high-accuracy ASR technique based on correlational weight analysis for elderly users

Chih-Hung Chou, Ta-Wen Kuan, Po-Chuan Lin, Jhing-Fa Wang, more

2014 International Conference on Orange Technologies > 189 - 192

2014 IEEE International Conference on Orange Technologies (ICOT)

This paper proposes a robust template based on the previously proposed ECWRT (enhanced cross word reference template) for template-based ASR, by using correlational weight adjusting method to improve robustness against elderly speech variation named CWCWRT. This work addresses two vital issues: such as outlier rejection in training set and elimination of unwanted utterances which usually happen by...

chapter

Efficient training of acoustic models for reverberation-robust medium-vocabulary automatic speech recognition

Armin Sehr, Hendrik Barfuss, Christian Hofmann, Roland Maas, more

2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA) > 177 - 181

2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA)

A recently proposed concept for training reverberation-robust acoustic models for automatic speech recognition using pairs of clean and reverberant data is extended from word models to tied-state triphone models in this paper. The key idea of the concept, termed ICEWIND, is to use the clean data for the temporal alignment and the reverberant data for the estimation of the emission densities. Experiments...

chapter

Dynamic Estimation of Rater Reliability in Subjective Tasks Using Multi-armed Bandits

Alexey Tarasov, Sarah Jane Delany, Brian Mac Namee

2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing > 979 - 980

2012 International Conference on Privacy, Security, Risk and Trust (PASSAT)

Many application areas that use supervised machine learning make use of multiple raters to collect target ratings for training data. Usage of multiple raters, however, inevitably introduces the risk that a proportion of them will be unreliable. The presence of unreliable raters can prolong the rating process, make it more expensive and lead to inaccurate ratings. The dominant, "static" approach...

chapter

Comparison of vector normalization methods in multi-level speaker verification

Szymon Drgas, Adam Dabrowski

2012 International Conference on Signals and Electronic Systems (ICSES) > 1 - 6

2012 International Conference on Signals and Electronic Systems (ICSES 2012)

In this article a text-independent speaker verification problem is considered. After the feature extraction, each conversation side has been represented as a vector in a fixed dimensional space. In order to reduce an influence of the lengths of utterances and also the channel properties, various vector normalization techniques have been selected from the literature, modified, and tested. Additionally,...

chapter

Adaptive boosting features for automatic speech recognition

Kham Nguyen, Tim Ng, Long Nguyen

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4733 - 4736

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

In this paper, we present a method to extract probabilistic acoustic features by using the Adaptive Boosting algorithm (AdaBoost). We build phoneme Gaussian mixture classifiers, and use AdaBoost to enhance the classification performance. The outputs from AdaBoost are the posterior probabilities for each frame given all phonemes. Those posterior features are then used to train a new acoustic model...

chapter

Using KL-divergence and multilingual information to improve ASR for under-resourced languages

David Imseng, Herve Bourlard, Philip N. Garner

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4869 - 4872

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

Setting out from the point of view that automatic speech recognition (ASR) ought to benefit from data in languages other than the target language, we propose a novel Kullback-Leibler (KL) divergence based method that is able to exploit multilingual information in the form of universal phoneme posterior probabilities conditioned on the acoustics. We formulate a means to train a recognizer on several...

chapter

Seamless error correction interface for voice word processor

Junhwi Choi, Kyungduk Kim, Sungjin Lee, Seokhwan Kim, more

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4973 - 4976

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

In this paper, we propose an error correction interface for a voice word processor. This correction interface includes user intention understanding and automatic error region detection. For accurate correction, we include a confirmation process that includes an error region control command and a re-uttering command. We evaluate the performance of the user intention understanding first, and we evaluate...

chapter

An adaboost-based weighting method for localizing human brain magnetic activity

T. Takiguchi, R. Takashima, Y. Ariki, T. Imada, more

Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference > 1 - 4

2012 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

This paper shows that pattern classification based on machine learning is a powerful tool for analyzing human brain activity data obtained by magnetoencephalography (MEG). In our previous work, a weighting method using multiple kernel learning was proposed, but this method had a high computational cost. In this paper, we propose a novel and fast weighting method using an AdaBoost algorithm to find...

article

Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion Classification

Sungrack Yun, Chang D. Yoo

IEEE Transactions on Audio, Speech, and Language Processing > 2012 > 20 > 2 > 585 - 598

This paper considers a learning framework for speech emotion classification using a discriminant function based on Gaussian mixture models (GMMs). The GMM parameter set is estimated by margin scaling with a loss function to reduce the risk of predicting emotions with high loss. Here, the loss function is defined as a function of a distance metric using the Watson and Tellegen's emotion model. Margin...

chapter

A method of Chinese organization named entities recognition based on statistical word frequency, part of speech and length

Xiying Yao

2011 4th IEEE International Conference on Broadband Network and Multimedia Technology > 637 - 641

2011 4th IEEE International Conference on Broadband Network & Multimedia Technology (IC-BNMT 2011)

We propose a recognition method based on statistics through analysis the grammatical and semantic characteristics of the Chinese organization name. This recognition method includes three elements: frequency, part of speech, word length. We use the data in mature collection as training data; separately calculate a candidate organization name's word frequency, part of speech and word length of the contribution...

chapter

Speaker age estimation and gender detection based on supervised Non-Negative Matrix Factorization

Mohamad Hasan Bahari, Hugo Van Hamme

2011 IEEE Workshop on Biometric Measurements and Systems for Security and Medical Applications (BIOMS) > 1 - 6

2011 IEEE Workshop on Biometric Measurements and Systems for Security and Medical Applications (BIOMS)

In many criminal cases, evidence might be in the form of telephone conversations or tape recordings. Therefore, law enforcement agencies have been concerned about accurate methods to profile different characteristics of a speaker from recorded voice patterns, which facilitate the identification of a criminal. This paper proposes a new approach for speaker gender detection and age estimation, based...

chapter

A Hybrid Approach for Part-of-Speech Tagging of Burmese Texts

Cynthia Myint

2011 International Conference on Computer and Management (CAMAN) > 1 - 4

2011 International Conference on Computer and Management (CAMAN 2011)

In Myanmar to English language translation system, in order to provide meaningful sentence from one language to another is non-trivial task. POS tagging is used as an early stage of linguistic text analysis in many applications. POS tagging is a process of assigning correct syntactic categories to each word. Tagsets and word disambiguation rules are fundamental parts of any POS tagger. This paper...

Keywords:
ACCURACY
SPEECH
TRAINING DATA

Publication date

Set your own date range

Publication type

book (35)
article (3)

Keywords

TRAINING (22)
SPEECH RECOGNITION (19)
HIDDEN MARKOV MODELS (12)
FEATURE EXTRACTION (10)
ACOUSTICS (7)
CLASSIFICATION ALGORITHMS (7)
DATABASES (7)
DATA MINING (6)
MACHINE LEARNING (6)
SIGNAL PROCESSING (6)
SPEAKER RECOGNITION (6)
TESTING (6)
ARTIFICIAL NEURAL NETWORKS (5)
COMPUTERS (5)
CONFERENCES (5)
NOISE (5)
ROBUSTNESS (5)
TEXT ANALYSIS (5)
DATA MODELS (4)
EQUATIONS (4)
ESTIMATION (4)
MEL FREQUENCY CEPSTRAL COEFFICIENT (4)
NATURAL LANGUAGE PROCESSING (4)
PATTERN RECOGNITION (4)
SPEECH PROCESSING (4)
SUPPORT VECTOR MACHINE CLASSIFICATION (4)
VECTORS (4)
ANALYTICAL MODELS (3)
COMPLEXITY THEORY (3)
COMPUTATIONAL MODELING (3)
CORRELATION (3)
DECISION TREES (3)
DETECTION ALGORITHMS (3)
DETECTORS (3)
EIGENVALUES AND EIGENFUNCTIONS (3)
ELECTRONIC MAIL (3)
FOURIER TRANSFORMS (3)
MATHEMATICAL MODEL (3)
PRESSES (3)
PRINCIPAL COMPONENT ANALYSIS (3)
SPEECH SYNTHESIS (3)
TRANSFORMS (3)
WAVELET TRANSFORMS (3)
WRITING (3)
ALGORITHM DESIGN AND ANALYSIS (2)
AUTOMATIC SPEECH RECOGNITION (2)
BOOSTING (2)
BRIGHTNESS (2)
CHINESE TO TAIWANESE TTS SYSTEM (2)
COMPUTATIONAL LINGUISTICS (2)
COVARIANCE MATRIX (2)
EDUCATIONAL INSTITUTIONS (2)
ENCODING (2)
FACE RECOGNITION (2)
GALLIUM NITRIDE (2)
GAUSSIAN MIXTURE MODEL (2)
GRAMMAR (2)
IMAGE RECOGNITION (2)
INFORMATION RETRIEVAL (2)
KNOWLEDGE BASED SYSTEMS (2)
LABORATORIES (2)
LEARNING (ARTIFICIAL INTELLIGENCE) (2)
LEARNING SYSTEMS (2)
MICROPHONES (2)
MULTILAYER PERCEPTRON (2)
MULTILAYER PERCEPTRONS (2)
NEURAL NETWORKS (2)
OPTIMIZATION (2)
ORAL COMMUNICATION (2)
PATTERN CLASSIFICATION (2)
PATTERN MATCHING (2)
PERIODIC STRUCTURES (2)
POLYSEMY PROBLEMS (2)
PREDICTION ALGORITHMS (2)
PROCEEDINGS OF THE IEEE (2)
RELIABILITY (2)
REVERBERATION (2)
SPEAKER IDENTIFICATION (2)
TIME DOMAIN ANALYSIS (2)
USA COUNCILS (2)
ACCELERATION (1)
ACOUSTIC MEASUREMENTS (1)
ACTIVE CONTOURS (1)
ADAPTIVE BOOSTING (1)
ADDITIVE NOISE (1)
AGE ESTIMATION (1)
ARTIFICIAL INTELLIGENCE (1)
ASSOCIATIVE MEMORY (1)
AUDIO LANGUAGE IDENTIFICATION (1)
AUDIO SIGNAL PROCESSING (1)
AUTOMATIC DETECTION (1)
AUTOMATIC SPEAKER RECOGNITION (1)
BAYESIAN METHODS (1)
BOOKS (1)
BRAIN (1)
BRIDGE CRACK DETECTION (1)
BRIDGES (1)
more

INFONA - science communication portal

Advanced search

Advanced search

On statistical machine translation method for lexicon refinement in speech recognition

Recognition of phonemes from estimation errors

Deep neural networks for cochannel speaker identification

Towards machines that know when they do not know: Summary of work done at 2014 Frederick Jelinek Memorial Workshop

ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks

Weighted training for speech under Lombard Effect for speaker recognition

Unsupervised feature learning for urban sound classification

Using k-Nearest Neighbor and Speaker Ranking for Phoneme Prediction

A high-accuracy ASR technique based on correlational weight analysis for elderly users

Efficient training of acoustic models for reverberation-robust medium-vocabulary automatic speech recognition

Dynamic Estimation of Rater Reliability in Subjective Tasks Using Multi-armed Bandits

Comparison of vector normalization methods in multi-level speaker verification

Adaptive boosting features for automatic speech recognition

Using KL-divergence and multilingual information to improve ASR for under-resourced languages

Seamless error correction interface for voice word processor

An adaboost-based weighting method for localizing human brain magnetic activity

Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion Classification

A method of Chinese organization named entities recognition based on statistical word frequency, part of speech and length

Speaker age estimation and gender detection based on supervised Non-Negative Matrix Factorization

A Hybrid Approach for Part-of-Speech Tagging of Burmese Texts

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options