The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we describe the utilisation of the diversity of different feature representations for speaker, emotion and language verification. The underlying principle behind the method is that some features are better at discriminating some classes and other features for other classes. Studies are done on four features and their combinations. An information theoretic procedure is described which...
In this paper we present a self-learning speech controlled system comprising speech recognition, speaker identification and speaker adaptation for a small number of users, e.g. five recurring speakers. A compact representation of speech and speaker characteristics is discussed. It is combined with a technique for efficient information retrieval to capture individual speech characteristics allowing...
Applications of automatic speech recognition (ASR) have been extended to a variety of tasks and domains, including spontaneous human-human speech. We have developed an ASR system for the Japanese Parliament (Diet), which is deployed this year. By exploiting official records made by human stenographers, we have realized an efficient training scheme of acoustic and language models, which does not require...
Soccer highlight detection is an active research topic in recent years. In this paper, we present our effort to detect an important audio keyword - excited commentator speech, which contributes to a state-of-the-art soccer highlight extraction system. We propose an approach of using statistical classifier based on Gaussian mixture models (GMMs) with unsupervised model adaptation. The excited speech...
There have been proposed spoken dialog systems that utilizes simple database consisted of example sentences and the corresponding reply sentences. However, it is costly to prepare this database manually. In the present study, we propose a framework in which both the example and reply sentences are automatically generated from a database description table that describes minimum information for describing...
In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is recorded by one microphone only. The goal is to identify both of the speakers from their mixed signal. High recognition accuracies have already been reported when an accurately estimated signal-to-signal ratio (SSR) is available. In this paper, we approach the problem without estimating...
In this paper we present a technique for fast adaptation of speech and speaker related information. Fast learning is particularly useful for automatic personalization of speech-controlled devices. Such a personalization of human-computer interfaces to be used in intelligent environments represents an important research issue. Speech recognition is enhanced by speaker specific profiles which are continuously...
We present a new discriminative method of acoustic model adaptation that deals with a task-dependent speaking style. We have focused on differences of expressions or speaking styles between tasks and set the objective of this method as improving the recognition accuracy of indistinctly pronounced phrases dependent on a speaking style. The adaptation appends subword models for frequently observable...
This paper studies the influence of n-gram language models in the recognition of sung phonemes and words. We train uni-, bi-, and trigram language models for phonemes and bi- and trigrams for words. The word-level language model is estimated from a textual lyrics database. In the recognition we use a hidden Markov model based phonetic recognizer adapted to singing voice. The models were tested on...
In this paper, an adaptive speaker identification method combined with the human behavioral trait based on Gaussian mixture model (GMM) is constructed. The method can automatically select different length of speech for different speakers in identification process according to the feedback probability estimation, so it can guarantee identification accuracy without reducing, and to reduce the identification...
In this paper we present an approach for automatic alignment of long audio data with varied acoustic conditions to their corresponding transcripts in an effective manner. Accurate time-aligned transcripts provide easier access to audio materials by aiding applications such as the indexing, summarizing and retrieving of audio segments. Accurate time alignments are also necessary for labeling the training...
Speaker identification is the task of determining which speaker characteristics from the speakers known to the system best matches the unknown voice sample. SI requires multiple decision alternatives and to implement SI system using SVM techniques requires multi-class SVM classifier. In this paper, speaker model clustering is implemented on a SVM based SI system. Here, instead of clustering the speakers,...
In this paper, we implemented a multistage recognizer output voting error reduction (ROVER) method for better automatic speech recognition (ASR). The first stage ROVER is conducted by combining three recognizers, which are respectively trained with maximum likelihood estimation (MLE), minimum phone error (MPE) and recently proposed boosted maximum mutual information (BMMI) criteria. After that the...
Users require rapid and highly accurate speech recognition systems. Accuracy could be improved by unsupervised adaptation as provided by CMLLR (Constrained Maximum Likelihood Linear Regression). CMLLR-based batch-type unsupervised adaptation estimates a single global transformation matrix by utilizing unsupervised labeling; unfortunately, it needs prior labeling and so is not rapid. Our proposed technique...
In recent years, the field of automatic speaker identification has begun to exploit high-level sources of speaker-discriminative information, in addition to traditional models of spectral shape. These sources include pronunciation models, prosodic dynamics, pitch, pause, and duration features, phone streams, and conversational interaction. As part of this broader thrust, we explore a new frame-level...
Gaussian mixture model - universal background model (GMM-UBM) is a standard reference classifier in speaker verification. We have proposed a simplified model using vector quantization (VQ-UBM). In this study, we extensively compare these two classifiers on NIST 2005, 2006 and 2008 SRE corpora, while having a standard discriminative classifier (GLDS-SVM) as a reference point. We focus on parameter...
This paper applies two dynamic Bayes networks that include theoretical and measured kinematic features of the vocal tract, respectively, to the task of labeling phoneme sequences in unsegmented dysarthric speech. Speaker dependent and adaptive versions of these models are compared against two acoustic-only baselines, namely a hidden Markov model and a latent dynamic conditional random field. Both...
The use of the PC and Internet for placing telephone calls will present new opportunities to capture vast amounts of un-transcribed speech for a particular speaker. This paper investigates how to best exploit this data for speaker-dependent speech recognition. Supervised and unsupervised experiments in acoustic model and language model adaptation are presented. Using one hour of automatically transcribed...
Speaker recognition systems tend to degrade if the training and testing conditions differ significantly. Such situations may arise due to the use of different microphones, telephone and mobile handsets or different acoustic conditions. Recently, the effect of the room acoustics on speaker identification (SI) has been investigated and it has been shown that a loss in accuracy results when using clean...
This paper studies the performance of the data selection with a combined task analysis method in task adaptation on Mandarin isolated word recognition. The proposed task analysis method combines coverage unit balanced task analysis with the confusability based analysis. The performance is evaluated with several experiments.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.