The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Even if the Vector Space Model used for document representation in information retrieval systems integrates a small quantity of knowledge it continues to be used due to its computational cost, speed execution and simplicity. We try to improve this document representation by adding some syntactic information such as the parts of speech. In this paper, we have evaluated three different tagging algorithms...
Recurrent neural networks (RNNs) have recently been applied as the classifiers for sequential labeling problems. In this paper, deep bidirectional RNNs (DBRNNs) are applied for the first time to error detection in automatic speech recognition (ASR), which is a sequential labeling problem. We investigate three types of ASR error detection tasks, i.e. confidence estimation, out-of-vocabulary word detection...
In this paper, we propose a method for avoiding digressions in discussion by detecting unnecessary utterances and having a dialogue system intervene. The detector is based on the features using word frequency and topic shifts. The performance (i.e. accuracy, recall, precision, and F-measure) of the unnecessary utterance detector is evaluated through leave-one-dialogue-out cross-validation. In the...
Computer assisted language learning (CALL) and, more specifically, computer assisted pronunciation training (CAPT) have received considerable attention in recent years. CAPT allows continuous feedback to the learner without requiring the sole attention of the teacher; it facilitates self study and encourages interactive use of the language in preference to rote learning. One of the important processes...
In prosody event detection field, many local acoustic features have been proposed for representing the prosody characteristics of speech unit. The context information that represents some possible regularities underlying neighboring prosody events, however, hasn't been used effectively. The main difficulty to utilize prosodic context is that it's hard to capture the long-distance sequential dependency...
This work aims at investigating the automatic recognition of speaker role in meeting conversations from the AMI corpus. Two types of roles are considered: formal roles, fixed over the meeting duration and recognized at recording level, and social roles related to the way participants interact between themselves, recognized at speaker turn level. Various structural, lexical and prosodic features as...
This paper addresses the issue of error region detection and characterization in LVCSR transcriptions. It is a well-known phenomenon that errors are not independent and tend to co-occur in automatic transcriptions. We are interested in automatically detecting these so-called error regions. Additionally, in the context of information extraction in TVBN shows, being able to automatically characterize...
Speaker role recognition in TV Broadcast News shows is addressed in this paper with a particular focus on speaker turn role labeling. A mixed approach combining speaker clustering and analysis of Automatic Speech Recognition output is proposed for assigning speaker turns a role among: anchor, reporter and other. 86% classification accuracy is obtained for automatically segmented speaker turns on a...
Most previous work on meeting summarization focused on extractive approaches; however, directly concatenating the extracted spoken utterances may not form a good summary. In this paper, we investigate if it is feasible to compress the transcribed spoken utterances and if using the compressed utterances benefits meeting summarization. We model the utterance compression task as a sequence labeling problem,...
A key challenge in rapidly building Tibetan language speech recognition applications is minimizing the manual effort required in transcribing and labeling speech data. Accurate labeling of Tibetan speech utterances is extremely time consuming and requires trained linguists. For alleviate this problem, we present an approach that aims at reducing the amount of manually transcribed speech data required...
In the process of building speech recognition models, accurate labeling of speech utterances is extremely time consuming and requires trained linguists. For fast building the speech recognition models in some industrial applications, we present a novel sample selection strategy that can use very few labeled speech utterances to construct the effective recognition model. The experimental results show...
In the researches on Tibetan language speech recognition, accurate labeling of Tibetan speech utterances is extremely time consuming and requires trained linguists. For alleviate this problem, we present an approach that can use few labeled Tibetan speech utterances to construct the effective recognition model. The experimental results show that our approach has better performance than traditional...
The paper presents a support vector machine based Part-Of-Speech tagging on Chinese database which is part of our speech synthesis system. The model can be classified as SVM model and uses many sequential features to predict the POS tag. The text database was download from the internet with 1,280,000 words and 33 parts of Speech. The total accuracy of our experiments is 99.31%.
We present an approach to unsupervised speaker role labeling in talk show data that makes use of two complementary sets of features: structural features that encode the participation patterns of speakers, and lexical features, which capture characteristic phrases. Techniques for using multiple clusterings are explored, leading to more robust results. Experiments on English and Mandarin talk shows...
In a spoken dialog system, the example-based response generation method generates a response by searching a dialog example database for the example question most similar to an input user utterance. That method has the advantage of ease of system expansion. It requires, however, a number of utterance examples whose correct responses are labeled. In this paper, we propose an approach to reducing the...
Users require rapid and highly accurate speech recognition systems. Accuracy could be improved by unsupervised adaptation as provided by CMLLR (Constrained Maximum Likelihood Linear Regression). CMLLR-based batch-type unsupervised adaptation estimates a single global transformation matrix by utilizing unsupervised labeling; unfortunately, it needs prior labeling and so is not rapid. Our proposed technique...
In this paper, we propose to study prosody with context-dependent acoustic models. We find that we can achieve better resolution on a specific aspect by training CDM with certain focus. For the tone recognition task, CDM with focus on tones should be used and it achieves 15.2% relative error reduction, when comparing with the traditional tri-phone models. For detecting prosody boundaries, CDM with...
A new method for image feature extraction and segmentation is proposed in this paper. Abundant contour feature information of the image is expressed by contourlet transform while texture feature of the image is described by wavelet transform and Gray Level Co-occurrence Matrix (GLCM). The three type feature information compose feature matrix. The presented method describes different image information...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.