The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A new method of classification of a speaker’s gender based on cumulant coefficients is proposed. The effect of an additive noise and measurement error of classification signs on accuracy of classification is analyzed. The expediency of construction of an adaptive system of classification operating with considering of masking of a speech signal by noise is shown. Comparison of the proposed method of...
This paper tackles the Romanian syllabification and stress assignment problems, and proposes an efficient machine learning based solution. We show that by designing the appropriate feature sets for each specific problem, learning algorithms achieve satisfactory accuracy rates for both problems (∼92% for syllabification, ∼85% for stress assignment), even for relatively small training set sizes. We...
Recent studies have demonstrated the potential of unsupervised feature learning for sound classification. In this paper we further explore the application of the spherical k-means algorithm for feature learning from audio signals, here in the domain of urban sound classification. Spherical k-means is a relatively simple technique that has recently been shown to be competitive with other more complex...
In this paper, we present a model for Turkish speech recognition. The model is syllable-based, where the recognition is performed through syllables as speech recognition units. The main goal of the model is to recognize as much as possible of a given continuous speech by identifying only a small set of syllables in the language. For that purpose, only the syllable types with a higher frequency are...
Many application areas that use supervised machine learning make use of multiple raters to collect target ratings for training data. Usage of multiple raters, however, inevitably introduces the risk that a proportion of them will be unreliable. The presence of unreliable raters can prolong the rating process, make it more expensive and lead to inaccurate ratings. The dominant, "static" approach...
This paper presents a classifier combination to solve telegraphese restoration problem. By implementing more than one classifier, it can support other classifier, and finally it can improve the performance. Using supplied development data, training data and testing data, the best model had an accuracy F = 79 %.
Cortical recordings with high temporal resolution enable the tracking of neuronal excitation in response to stimuli. Here intra and extracranial recordings are analyzed from experiments presenting varied speech and language stimuli to human subjects. These studies demonstrate that information about speech and language is widely distributed across the brain, both spatially and temporally. Analyses...
We present a speech pre-processing scheme (SPPS) for robust speech recognition in the moving motorcycle environment. The SPPS is dynamically adapted during the run-time operation of the speech front-end, depending on short-time characteristics of the acoustic environment. In detail, the fast varying acoustic environment is modeled by GMM clusters based on which a selection function determines the...
This work is part of research to build a system to combine facial and prosodic information to recognize commonly occurring user states such as delight and frustration. We create two experimental situations to elicit two emotional states: the first involves recalling situations while expressing either delight or frustration; the second experiment tries to elicit these states directly through a frustrating...
A rule based Local Word Grouper (LWG) or Chunker has been attempted and applied to Bangla. The Chunker has been evaluated with PARSEVAL method. The paper describes implementation of the Chunker as well as the evaluation method in detail. The evaluation shows a precision value of 95.05%, recall value of 94.33% and f-score value of 94.62 which are up to the standard. The results also show a substantial...
A novel approach was developed to recognize vowels from continuous tongue and lip movements. Vowels were classified based on movement patterns (rather than on derived articulatory features, e.g., lip opening) using a machine learning approach. Recognition accuracy on a single-speaker dataset was 94.02% with a very short latency. Recognition accuracy was better for high vowels than for low vowels....
Most existing research in the area of emotions recognition has focused on short segments or utterances of speech. In this paper we propose a machine learning system for classifying the overall sentiment of long conversations as being Positive or Negative. Our system has three main phases, first it divides a call into short segments, second it applies machine learning to recognize the emotion for each...
We collected the locations of eye fixations of Chinese native speakers when they read four Chinese articles, and attempted to analyze how the contextual linguistic and personal information influence the landing positions within the landing sites. In addition, we employed machine learning techniques to build models for the prediction of the landing positions. The models performed well for the closed...
Temporal information extraction is a popular and interesting research field in the area of Natural Language Processing (NLP). The main tasks involve the identification of event-time, event-document creation time and event-event relations in a text. In this paper, we take up Task C that involves identification of relations between the events in adjacent sentences under the TimeML framework. We use...
Accelerated growth of the Internet has enabled users worldwide to share their feelings and experiences. User-generated content (UGC) websites are the most abundant sources of user reviews. Accurately identifying sentiment phrases is essential to understand the expressed opinions in user reviews. To achieve this, part-of-speech (POS) patterns of phrases are useful. However, previous studies for Chinese...
Prediction of the prosodic phrase boundary is a potent influence on the performance of speech recognition and voice synthesis systems. We propose a statistical approach using efficient learning features for the natural prediction of the Korean prosodic phrase boundary. These new features reflect factors that affect the generation of the prosodic phrase boundary better than existing learning features...
Clinical diagnosis of voice disorders is based on examination of the oscillating vocal folds during phonation with state-of-the-art endoscopic high-speed cameras. Commonly, the offline analysis is performed in a subjective and time-consuming manner via slow-motion playback. In this study an objective method for overcoming this drawback is presented being based on phonovibrogram (PVG) images. For a...
Parts of speech tagging forms the important pre-processing step in many of the natural language processing applications like text summarization, question answering and information retrieval system. MorphoSyntactic disambiguation (part of speech tagging) is the process of classifying every word in a given context to its appropriate part of speech. In this paper, we first review all the supervised machine...
Textual entailment recognition (RTE) is one of the fundamental problems in many natural language processing applications. This paper proposes a new method for lexical entailment measure which is based on exploiting the information in the WordNet glosses. Further we perform textual entailment recognition based on this method and cast the RTE problem to be a classification problem. The experimental...
According to numbers of music cognitive experiments, moods or emotions in music could be categorical. Since mood classifications are commonly used to structure the large collections of music available on the Web, automatic discrimination between mood taxonomy of Chinese traditional music and Western classical music would be a valuable addition to music information retrieval (MIR) systems. In this...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.