Search results

Items from 1 to 20 out of 378 results

chapter

EEG-based auditory attention decoding: Impact of reverberation, noise and interference reduction

Ali Aroudi, Simon Doclo

2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) > 3042 - 3047

2017 IEEE International Conference on Systems, Man and Cybernetics (SMC)

To identify the attended speaker from single-trial EEG recordings in an acoustic scenario with two competing speakers, an auditory attention decoding (AAD) method has recently been proposed. The AAD method requires the clean speech signals of both the attended and the unattended speaker as reference signals for decoding. However, in practice only the binaural signals, containing several undesired...

chapter

Data hiding method in speech using echo embedding and voicing correction

Bartosz Tabara, Jaroslaw Wojtun, Zbigniew Piotrowski

2017 Signal Processing Symposium (SPSympo) > 1 - 6

2017 Signal Processing Symposium (SPSympo)

Ste gano graphic systems are used for the transmission of hidden data in the original signal. The article describes the algorithm of the hidden data transmission using the speech signal as a carrier. The echo method is used for data embedding. In order to improve the decoding efficiency of embedded data, the procedure of voicing correction and mechanism of informed coding were developed and implemented...

chapter

Neural network alternatives toconvolutive audio models for source separation

Shrikant Venkataramani, Cem Subakan, Paris Smaragdis

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)

Convolutive Non-Negative Matrix Factorization model factorizes a given audio spectrogram using frequency templates with a temporal dimension. In this paper, we present a convolutional auto-encoder model that acts as a neural network alternative to convolutive NMF. Using the modeling flexibility granted by neural networks, we also explore the idea of using a Recurrent Neural Network in the encoder...

chapter

A deep convolutional encoder-decoder model for robust speech dereverberation

D. S. Wang, Y. X. Zou, W. Shi

2017 22nd International Conference on Digital Signal Processing (DSP) > 1 - 5

2017 22nd International Conference on Digital Signal Processing (DSP)

Research shows that speech dereverberation (SD) with Deep Neural Network (DNN) achieves the state-of-the-art results by learning spectral mapping, which, simultaneously, lacks the characterization of the local temporal spectral structures (LTSS) of speech signal and calls for a large storage space that is impractical in real applications. Contrarily, the Convolutional Neural Network (CNN) offers a...

chapter

Transcriber: An Android application that automates the transcription of interviews in Indonesian

Rahman Adianto, Cil Hardianto Satriawan, Dessi Puji Lestari

2017 International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA) > 1 - 6

2017 International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA)

In this paper, Transcriber that can be used to automatically transcribe interviews in Indonesian using speech-to-text and speaker diarization technology is described. The main feature of the software is generating interview transcription automatically and providing an option if grouping by group of speakers is required. Transcriber is designed to work in two modes that give users the freedom to provide...

chapter

Hybrid Source-Channel Coding with Bandwidth Expansion for Speech Data

Minh-Quang Nguyen, Hang Nguyen, Eric Renault, Yusheng Ji

2017 IEEE 85th Vehicular Technology Conference (VTC Spring) > 1 - 5

2017 IEEE 85th Vehicular Technology Conference: VTC2017-Spring

Hybrid digital-analog (HDA) architectures have been widely developed for efficient digital transmission of analog speech, audio or video data. By considering the advantage of both digital and analog components, HDA systems gain better performances than purely analog and digital schemes in a wide range of channel conditions. However, HDA systems described in previous works are mostly designed for continuous-valued...

chapter

Hybrid context dependent CD-DNN-HMM keywords spotting on continuous speech

Hinda Dridi, Kais Ouni

2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) > 1 - 7

2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)

In this paper we describe a systematic procedure to implement two-stage based keywords spotting system (KWS). In first stage, a phonetic decoding of continuous speech is obtained using a CD-DNN-HMM model built with the Kaldi toolkit. In second stage, these results of phonetic transcriptions will serve to construct a system to search the keywords embedded in continuous speech using the classification...

chapter

A detection method of unbearable bit error rate for a 2.4kbps MELP vocoder

Ye Li, Qiuyun Hao, Peng Zhang, Yanhong Fan, more

2017 IEEE 9th International Conference on Communication Software and Networks (ICCSN) > 75 - 79

2017 IEEE 9th International Conference on Communication Software and Networks (ICCSN)

In harsh channel conditions, the quality of the synthetic speech at low bit rate would be affected severely. In order to improve the robustness of the vocoder and make it more resilient to errors in random channel, unequal error protection (UEP) channel coding is usually adopted. However, when the errors cannot be corrected in some cases, UEP channel coding will not improve the quality of the synthetic...

chapter

Decoding dynamic auditory attention during naturalistic experience

Liting Wang, Xintao Hu, Meng Wang, Jinglei Lv, more

2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017) > 974 - 977

2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017)

Equipped with selective auditory attention (SAA), people are able to rapidly shift their attention to auditory events of interest. Although abstract neuroimaging paradigms are fundamental for exploring the neural basis of SAA, whether those findings are valid in a more naturalistic condition and how the types of auditory stimuli affect SAA are largely unknown. Here we propose a brain decoding study...

chapter

Fast speech keyword recognition based on improved filler model

Yang Wang, Jie Yang, Le Zhang

2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) > 530 - 534

2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)

Most traditional template matching based keyword recognition methods don't need training data, just rely on frame matching. However, the recognition speed is relatively slow and it can't be used in practice. The LVCSR-based method needs to convert the speech signal into text signal before recognition, which has an important impact on the final recognition performance. In this paper, we propose a method...

chapter

Analysis of keyword spotting performance across IARPA babel languages

William Hartmann, Damianos Karakos, Roger Hsiao, Le Zhang, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5765 - 5769

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

With the completion of the IARPA Babel program, it is possible to systematically analyze the performance of speech recognition systems across a wide variety of languages. We select 16 languages from the dataset and compare performance using a deep neural network-based acoustic model. The focus is on keyword spotting using the actual term-weighted value (ATWV) metric. We demonstrate that ATWV is keyword...

chapter

Combination strategy based on relative performance monitoring for multi-stream reverberant speech recognition

Feifei Xiong, Stefan Goetze, Bernd T. Meyer

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4870 - 4874

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A multi-stream framework with deep neural network (DNN) classifiers is applied to improve automatic speech recognition (ASR) in environments with different reverberation characteristics. We propose a room parameter estimation model to establish a reliable combination strategy which performs on either DNN posterior probabilities or word lattices. The model is implemented by training a multilayer perceptron...

chapter

Constructing sub-word units for spoken term detection

Charl van Heerden, Damianos Karakos, Karthik Narasimhan, Marelie Davel, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5780 - 5784

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Spoken term detection, especially of out-of-vocabulary (OOV) keywords, benefits from the use of sub-word systems. We experiment with different language-independent approaches to sub-word unit generation, generating both syllable-like and morpheme-like units, and demonstrate how the performance of syllable-like units can be improved by artificially increasing the number of unique units. The effect...

chapter

Environment aware speaker diarization for moving targets using parallel DNN-based recognizers

Maryam Najafian, John H. L. Hansen

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5450 - 5454

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Current diarization algorithms are commonly applied to the outputs of single non-moving microphones. They do not explicitly identify the content of overlapped segments from multiple speakers or acoustic events. This paper presents an acoustic environment aware child-adult diarization applied to the audio recorded by a single microphone attached to moving targets under realistic high noise conditions...

chapter

Towards decoding speech production from single-trial magnetoencephalography (MEG) signals

Jun Wang, Myungjong Kim, Angel W. Hernandez-Mulero, Daragh Heitzman, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3036 - 3040

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Patients with locked-in-syndrome (fully paralyzed but aware) struggle in their life and communication. Providing a level of communication offers these patients a chance to resume a meaningful life. Current brain-computer interface (BCI) communication requires users to build words from single letters selected on a screen, which is extremely inefficient. Faster approaches for their speech communication...

chapter

Feature extraction using multimodal convolutional neural networks for visual speech recognition

Eric Tatulli, Thomas Hueber

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2971 - 2975

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This article addresses the problem of continuous speech recognition from visual information only, without exploiting any audio signal. Our approach combines a video camera and an ultrasound imaging system for monitoring simultaneously the speaker's lips and the movement of the tongue. We investigate the use of convolutional neural networks (CNN) to extract visual features directly from the raw ultrasound...

chapter

Deep neural network based wake-up-word speech recognition with two-stage detection

Fengpei Ge, Yonghong Yan

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2761 - 2765

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a novel far-field voice trigger algorithm utilizing DNN with the objective function of state-level minimum Bayes risk for training, customizing the decoding network to absorb the ambient noise and background speech. We adopt a two-stage classification strategy to integrate the phonetic knowledge and model-based classification into detecting wake-up words. Experimental results of...

chapter

Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features

Ondrej Klejch, Peter Bell, Steve Renals

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5700 - 5704

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we present an extension of our previously described neural machine translation based system for punctuated transcription. This extension allows the system to map from per frame acoustic features to word level representations by replacing the traditional encoder in the encoder-decoder architecture with a hierarchical encoder. Furthermore, we show that a system combining lexical and acoustic...

chapter

Effective keyword search for low-resourced conversational speech

Rasa Lileikyte, Thiago Fraga-Silva, Lori Lamel, Jean-Luc Gauvain, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5785 - 5789

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we aim to enhance keyword search for conversational telephone speech under low-resourced conditions. Two techniques to improve the detection of out-of-vocabulary keywords are assessed in this study: using extra text resources to augment the lexicon and language model, and via subword units for keyword search. Two approaches for data augmentation are explored to extend the limited amount...

chapter

Cumulative moving averaged bottleneck speaker vectors for online speaker adaptation of CNN-based acoustic models

Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5175 - 5179

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Adapting acoustic models to speakers have shown to greatly improve performance for many tasks. Among the adaptation approaches, exploiting auxiliary features characterizing speakers or environments has received great attention because they allow rapid adaptation, i.e. adaptation with limited amount of speech data such as a single utterance. However, the auxiliary features are usually computed in batch...

Keywords:
DECODING
SPEECH

Publication date

Set your own date range

INFONA - science communication portal

Search results

EEG-based auditory attention decoding: Impact of reverberation, noise and interference reduction

Data hiding method in speech using echo embedding and voicing correction

Neural network alternatives toconvolutive audio models for source separation

A deep convolutional encoder-decoder model for robust speech dereverberation

Transcriber: An Android application that automates the transcription of interviews in Indonesian

Hybrid Source-Channel Coding with Bandwidth Expansion for Speech Data

Hybrid context dependent CD-DNN-HMM keywords spotting on continuous speech

A detection method of unbearable bit error rate for a 2.4kbps MELP vocoder

Decoding dynamic auditory attention during naturalistic experience

Fast speech keyword recognition based on improved filler model

Analysis of keyword spotting performance across IARPA babel languages

Combination strategy based on relative performance monitoring for multi-stream reverberant speech recognition

Constructing sub-word units for spoken term detection

Environment aware speaker diarization for moving targets using parallel DNN-based recognizers

Towards decoding speech production from single-trial magnetoencephalography (MEG) signals

Feature extraction using multimodal convolutional neural networks for visual speech recognition

Deep neural network based wake-up-word speech recognition with two-stage detection

Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features

Effective keyword search for low-resourced conversational speech

Cumulative moving averaged bottleneck speaker vectors for online speaker adaptation of CNN-based acoustic models

Filter options

Publication date

Content availability

Keywords

Data set

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options