Advanced search

From:

To:

Items from 1 to 20 out of 33 results

chapter

HMM-based modelling of individual syllables for bird species recognition from audio field recordings

Peter Jancovic, Masoud Zakeri, Munevver Kokuer, Martin Russell

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 768 - 772

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents an automatic system for recognition of bird species from audio field recordings. The acoustic signal is first segmented into isolated time-frequency segments, each corresponding to an individual detected sinusoidal component. Each segment is represented by a temporal sequence of the frequency values of the detected sinusoid, referred to as frequency track. Hidden Markov models...

chapter

Two-stage phone recognition system using articulatory and spectral features

K E Manjunath, K. Sreenivasa Rao, M Gurunath Reddy

2015 International Conference on Signal Processing and Communication Engineering Systems > 107 - 111

2015 International Conference on Signal Processing And Communication Engineering Systems (SPACES)

In this paper, we propose a two-stage phone recognition system using articulatory and spectral features. In the first stage, articulatory features are predicted from spectral features using FeedForward Neural Networks (FFNNs). In the second stage, phone recognition is carried out using the predicted articulatory features and spectral features together. FFNNs and Hidden Markov Models are explored for...

chapter

Design of a POS tagger using conditional random fields for Malayalam

V. Krishnapriya, P. Sreesha, T. R. Harithalakshmi, T. C. Archana, more

2014 First International Conference on Computational Systems and Communications (ICCSC) > 370 - 373

2014 First International Conference on Computational Systems and Communications (ICCSC)

Parts of Speech tagging, is a process of marking the words in a text as corresponding to a particular part of speech, based on its definition and context POS tagger plays an important role in Natural language applications like speech recognition, natural language parsing, information retrieval and extraction. This paper discusses architecture for designing a Part-Of-Speech (POS tagger for Malayalam...

chapter

Improving recognition of syallabic units of Hindi languagae using combined features of Throat Microphone and Normal Microphone speech

N. Radha, A. Shahina, G. Vinoth, A. Nayeemulla Khan

2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT) > 1343 - 1348

2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT)

The performance of Automatic Speech recognition system (ASR) built using close talk microphones degrades in noisy environments. AS R built using Throat Microphone (TM) speech shows relatively better performance under such adverse situations. However, some of the sounds are not well captured in TM. In this work we explore the combined use of Normal Microphone (NM) and TM features to improve the recognition...

article

Simultaneous-Speaker Voice Activity Detection and Localization Using Mid-Fusion of SVM and HMMs

Vicente P. Minotto, Claudio R. Jung, Bowon Lee

IEEE Transactions on Multimedia > 2014 > 16 > 4 > 1032 - 1044

Humans can extract speech signals that they need to understand from a mixture of background noise, interfering sound sources, and reverberation for effective communication. Voice Activity Detection (VAD) and Sound Source Localization (SSL) are the key signal processing components that humans perform by processing sound signals received at both ears, sometimes with the help of visual cues by locating...

chapter

Cross-stream dependency modeling using continuous F0 model for HMM-based speech synthesis

Xin Wang, Zhen-Hua Ling, Li-Rong Dai

2012 8th International Symposium on Chinese Spoken Language Processing > 84 - 87

2012 8th International Symposium on Chinese Spoken Language Processing (ISCSLP 2012)

In our previous work, we have presented a cross-stream dependency modeling method for hidden Markov model (HMM) based parametric speech synthesis. In this method, multi-space probability distribution (MSD) was adopted for F0 modeling and the voicing decision error influenced the accuracy of generated spectral features severely. Therefore, a cross-stream dependency modeling method using continuous...

chapter

Bangla ASR design by suppressing gender factor with gender-independent and gender-based HMM classifiers

Foyzul Hassan, Mohammed Rokibul Alam Kotwal, Mohammad Nurul Huda

2011 World Congress on Information and Communication Technologies > 1276 - 1281

2011 World Congress on Information and Communication Technologies (WICT)

Hidden factor such as gender characteristic plays an important role on the performance of Bangla (widely used as Bengali) automatic speech recognition (ASR). If there is a suppression process that represses the decrease of differences in acoustic-likelihood among categories resulted from gender factors, a robust ASR system can be realized. In our previous paper, we proposed a technique of gender effects...

chapter

Hybridization of two stage Multilayer Neural Networks based Bangla ASR incorporating dynamic parameters

Mohammed Rokibul Alam Kotwal, Md. Abdur Razzaque, Arif Hossen, Mohammad Nurul Huda

2011 11th International Conference on Hybrid Intelligent Systems (HIS) > 167 - 172

2011 11th International Conference on Hybrid Intelligent Systems (HIS 2011)

This paper presents a hybridization of Multilayer Neural Network-based Bangla phoneme recognition method for Automatic Speech Recognition (ASR) incorporating dynamic parameters. The method consists of four stages: at first stage, a multilayer neural network (MLN) converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities. Phoneme probabilities from the first...

chapter

Gender Effects Suppression in Bangla ASR by Designing Multiple HMM-Based Classifiers

Mohammed Rokibul Alam Kotwal, Foyzul Hassan, Md. Shafiul Alam, Shakib Ibn Daud, more

2011 International Conference on Computational Intelligence and Communication Networks > 390 - 394

2011 International Conference on Computational Intelligence and Communication Networks (CICN)

Speaker-specific characteristics play an important role on the performance of Bangla (widely used as Bengali) automatic speech recognition (ASR). It is difficult to recognize speech affected by gender factors, especially when an ASR system contains only a single acoustic model. If there exists any suppression process that represses the decrease of differences in acoustic-likelihood among categories...

chapter

Hybrid Features for Neural Network-Based Bangla ASR Incorporrating Velocity Coefficients (?)

Mohammed Rokibul Alam Kotwal, Foyzul Hassan, Shakib Ibn Daud, Md. Shafiul Alam, more

2011 International Conference on Computational Intelligence and Communication Networks > 416 - 420

2011 International Conference on Computational Intelligence and Communication Networks (CICN)

This paper presents a Neural Network-based Bangla phoneme recognition method for Automatic Speech Recognition (ASR). The method consists of three stages: at first stage, a multilayer neural network (MLN) converts acoustic features, mel frequency cepstral coefficients (MFCCs), into phoneme probabilities, where the second stage computes velocity (?) coefficients from the phoneme probabilities by using...

chapter

Performance evaluation of MLPC and MFCC for HMM based noisy speech recognition

M Rahman, M B I Islam

2010 13th International Conference on Computer and Information Technology (ICCIT) > 273 - 276

13th International Conference on Computer and Information Technology (ICCIT 2010)

In this paper auditory like features MLPC and MFCC have been used as front-end and their performance has been evaluated on Aurora-2 database for Hidden Markov Model (HMM) based noisy speech recognition. The clean data set is used for training and test set A is used to examine the performance. It has been found that almost the same recognition performance has been obtained both for MLPC and MFCC and...

chapter

New robust speech recognition using DTW in noise

Zhang Yuxin, Y Miyanaga, C Siriteanu

2010 10th International Symposium on Communications and Information Technologies > 34 - 38

2010 10th International Symposium on Communications and Information Technologies (ISCIT 2010)

This paper proposes a new robust speech recognition method. Since the hidden Markov model (HMM) algorithm need a lot of training calculation, The dynamic time warping (DTW) algorithm based on median filter is used instead in our system. According to the short-term energy method, the non-speech segment can be removed. Recognition accuracy is thus improved. The cepstral mean subtraction (CMS), running...

chapter

Distinctive Phonetic Features (DPFs)-Based Isolated Word Recognition Using Multilayer Neural Networks

M N Huda, M M Hasan, S Ahmed, D F Rahman, more

2010 First International Conference on Integrated Intelligent Computing > 51 - 55

2010 First International Conference on Integrated Intelligent Computing (ICIIC 2010)

This paper describes an isolated word recognition method based on distinctive phonetic features (DPFs). The method comprises two multilayer neural networks (MLNs). The first MLN, MLNLF-DPF, maps local features (LFs) of an input speech signal into discrete DPFs and the second MLN, MLNDyn, restricts dynamics of outputted DPFs by the MLNLF-DPF. In the experiments on Tohokudai Isolated Spoken-Word Database...

chapter

Exploiting multimodal data fusion in robust speech recognition

Panikos Heracleous, Pierre Badin, Gérard Bailly, Norihiro Hagita

2010 IEEE International Conference on Multimedia and Expo > 568 - 572

2010 IEEE International Conference on Multimedia and Expo (ICME)

This article introduces automatic speech recognition based on Electro-Magnetic Articulography (EMA). Movements of the tongue, lips, and jaw are tracked by an EMA device, which are used as features to create Hidden Markov Models (HMM) and recognize speech only from articulation, that is, without any audio information. Also, automatic phoneme recognition experiments are conducted to examine the contribution...

chapter

Is Phoneme Level Better than Word Level for HMM Models in Limited Vocabulary ASR Systems?

Yousef Ajami Alotaibi

2010 Seventh International Conference on Information Technology: New Generations > 332 - 337

Seventh International Conference on Information Technology: New Generations (ITNG 2010)

In this paper Arabic alphadigits were investigated from the speech recognition problem point of view. Limited vocabulary Arabic Automatic Speech Recognition Systems (ASRs) were designed, implemented, and tested by using isolated word utterances which consists of Arabic alphabets and/or digits. These systems were implemented separately by using phoneme level and word level based HMM models in distinct...

chapter

CDHMM parameters selection for speaker-independent phone recognition in continuous speech system

Zaineb Ben Messaoud, Ahmed Ben Hamida

Melecon 2010 - 2010 15th IEEE Mediterranean Electrotechnical Conference > 253 - 258

MELECON 2010 - 2010 15th IEEE Mediterranean Electrotechnical Conference

Pattern recognition has long been a topic of fundamental importance in a wide range of science and technology. Over the years there have been a range of several tasks developed for speech recognition. While in recent years speech recognizer evaluation has focused on LVCSR research, we believe that evaluating recognition at the phone level is important since the words are always represented by the...

chapter

Partial sequence matching using an Unbounded Dynamic Time Warping algorithm

Xavier Anguera, Robert Macrae, Nuria Oliver

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 3582 - 3585

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

Before the advent of Hidden Markov Models(HMM)-based speech recognition, many speech applications were built using pattern matching algorithms like the Dynamic Time Warping (DTW) algorithm, which are generally robust to noise and easy to implement. The standard DTW algorithm usually suffers from lack of flexibility on start-end matching points and has high computational costs. Although some DTW-based...

chapter

Decision level combination of multiple modalities for recognition and analysis of emotional expression

Angeliki Metallinou, Sungbok Lee, Shrikanth Narayanan

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 2462 - 2465

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

Emotion is expressed and perceived through multiple modalities. In this work, we model face, voice and head movement cues for emotion recognition and we fuse classifiers using a Bayesian framework. The facial classifier is the best performing followed by the voice and head classifiers and the multiple modalities seem to carry complementary information, especially for happiness. Decision fusion significantly...

chapter

Speech recognition of Malayalam numbers

C. Kurian, K. Balakrishnan

2009 World Congress on Nature&Biologically Inspired Computing (NaBIC) > 1475 - 1479

2009 World Congress on Nature & Biologically Inspired Computing (NaBIC 2009)

Digit speech recognition is important in many applications such as automatic data entry, PIN entry, voice dialing telephone, automated banking system, etc. This paper presents speaker independent speech recognition system for Malayalam digits. The system employs Mel frequency cepstrum coefficient (MFCC) as feature for signal processing and hidden Markov model (HMM) for recognition. The system is trained...

chapter

Research and implementation of part-of-speech tagging based on Hidden Markov Model

Zhang Youzhi

2009 Asia-Pacific Conference on Computational Intelligence and Industrial Applications (PACIIA) > 2 > 26 - 29

2009 Asia-Pacific Conference on Computational Intelligence and Industrial Applications (PACIIA 2009)

As separate problems of English, MA, POS, PDR can be considered independent with each other. In a practical research system, they are dependent, solution of the prior one forms the base for processing the next one. We Consider different features of these problems, after a comprehensive study, a divide-and-conqueror strategy is proposed and resolves them separately. First, a knowledge-based method...

Keywords:
ACCURACY
SPEECH
HIDDEN MARKOV MODEL

Publication date

Set your own date range

INFONA - science communication portal

Advanced search

Advanced search

HMM-based modelling of individual syllables for bird species recognition from audio field recordings

Two-stage phone recognition system using articulatory and spectral features

Design of a POS tagger using conditional random fields for Malayalam

Improving recognition of syallabic units of Hindi languagae using combined features of Throat Microphone and Normal Microphone speech

Simultaneous-Speaker Voice Activity Detection and Localization Using Mid-Fusion of SVM and HMMs

Cross-stream dependency modeling using continuous F0 model for HMM-based speech synthesis

Bangla ASR design by suppressing gender factor with gender-independent and gender-based HMM classifiers

Hybridization of two stage Multilayer Neural Networks based Bangla ASR incorporating dynamic parameters

Gender Effects Suppression in Bangla ASR by Designing Multiple HMM-Based Classifiers

Hybrid Features for Neural Network-Based Bangla ASR Incorporrating Velocity Coefficients (?)

Performance evaluation of MLPC and MFCC for HMM based noisy speech recognition

New robust speech recognition using DTW in noise

Distinctive Phonetic Features (DPFs)-Based Isolated Word Recognition Using Multilayer Neural Networks

Exploiting multimodal data fusion in robust speech recognition

Is Phoneme Level Better than Word Level for HMM Models in Limited Vocabulary ASR Systems?

CDHMM parameters selection for speaker-independent phone recognition in continuous speech system

Partial sequence matching using an Unbounded Dynamic Time Warping algorithm

Decision level combination of multiple modalities for recognition and analysis of emotional expression

Speech recognition of Malayalam numbers

Research and implementation of part-of-speech tagging based on Hidden Markov Model

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options