The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We are now developing a Japanese speaking test called SCAT, which is part of J-CAT (Japanese Computerized Adaptive Test), a free online proficiency test for Japanese language learners. In this paper, we focus on the sentence-reading-aloud task and the sentence generation task in SCAT, and propose an automatic scoring method for estimating the overall score of answer speech, which is holistically determined...
In this paper, we explore the retrieval of perceptually similar audio. It focuses on finding sounds according to human perceptions. Thus such retrieval is more “human-centered” [1] than previous audio retrievals which intend to find homologous sounds. We make comprehensive use of various acoustic features to measure the perceptual similarity. Since some acoustic features may be redundant or even adverse...
Pedagogically, feedback in CAPT systems can be improved by focusing on the most critical errors rather than presenting all errors to the users at the same time. This paper presents our work on the use of crowdsourcing for collection of gradations of word-level mispronunciations in non-native English speech. Quality control procedures based on the proposed WorkerRank algorithm (adapted from well-known...
This paper proposed automated robot speech gesture generation system for service/entertainment robots. This system can automatically generate robot Beat gestures accompanied by speech interaction situation with humans. Beat gestures have no specific semantic meanings to communicate, but it is commonly believed that they are an essential factor of natural communication. We extracted basic gesture patterns...
Speech dialogue systems, such as Apple's “Siri,” have gradually become more widespread, and in the near future, a greater number of general users will have the opportunity to communicate with such systems. To facilitate this, it is necessary for the system to communicate more naturally with users, and to realize that both verbal and nonverbal information must be taken into consideration. Therefore,...
Expressing emotion to others and recognizing emotion state of the counterpart are not difficult for human. Emotion state of a person may be recognized from the facial expression, voice, and/or gesture. Speech emotion recognition research gained a lot of attention in recent years. One of the important subjects in speech emotion recognition research is the feature selection. The speech features used...
This paper presents a method of automatic lexical stress assessment for L2 English speech. Syllable stress can be labeled at three levels - primary (P), secondary (S) and no (N) stress, but secondary stress may vary among word pronunciations within and across accents and present difficulties for human perception. Hence, evaluation of lexical stress based on all three levels (i.e., the P-S-N criterion...
Synthesis of natural sounding speech is the greatest challenge in a Text-to-Speech Synthesis (TTS) system. In natural speech, duration, intensity and pitch are dynamically varied which is manifested as rhythm or prosody of speech. If these variations are not recreated, the synthesized speech will sound robotic. Synthesis of good quality speech depends on how well the duration and intonation patterns...
The spectral subtraction method is a classical approach for enhancement of degraded speech. The basic principle of this technique is to estimate the short-time spectral magnitude of speech by subtracting estimated noise from the noisy speech spectrum and to combine it with the phase of the noisy speech. Besides reducing the noise, this method generates an unnatural and unpleasant noise, called remnant...
Emotions play a key role in human-computer interaction. They are generally expressed through several ways (e.g. facial expressions, speech, body postures and gestures, etc). In this paper, we present a multimodal approach for the emotion recognition that integrates information coming from different cues and modalities. It is based on a formal multidimensional model using an algebraic representation...
Singing and speaking are important and natural ways in communications for humans to express nonlinguistic and linguistic information. It seems the majority of common people correctly perform and imitate all factors such as pitches and melodies as the same as those achieved by professional singers, while they can correctly vocalize all factors involved in speaking. There is no absolute answer as to...
For machines to converse with humans, they must at times resolve ambiguities. We are developing a conversational robot which is able to gather information about its world through sensory actions such as touch and active shifts of visual attention. The robot is also able to gain new information linguistically by asking its human partner questions. Each kind of action, sensing and speech, has associated...
The goal of the study was to explore middle school students' preferences for an animated engineering tutor, and investigate their rationales for their choices. 77 middle school students participated in the study, and provided their preferences and rationales on various dimensions of an animated engineering tutor such as gender, age, personality, and clothing. Results showed that for teaching engineering...
This paper presents an audio monitoring system for detecting and identifying people engaged in a conversation. The proposed method is hands-free as it uses a microphone array to acquire the sound. A particularity of the approach is the use of a laser range finder based human tracker system. The human tracker monitors the locations of people then local steered response power is used to detect the people...
We present an approach, based on non-negative matrix factorization, for learning to recognize parallel combinations of initially unknown human motion primitives, associated with ambiguous sets of linguistic labels during training. In the training phase, the learner observes a human producing complex motions which are parallel combinations of initially unknown motion primitives. Each time the human...
Emotion plays a critical role in human interaction. Listeners can perceive the emotional state of speakers from their facial expression, gestures and/or speech. In this paper, we investigate the relationship of the intended emotion expressed by speakers and the emotion perceived by listeners of a newly recorded corpus. We investigate the consistency of the emotion expressed by speakers and the emotion...
Previous studies have demonstrated that the left hemisphere was specialized in language function using functional magnetic resonance imaging (fMRI). On the other hand, some studies have revealed that the right hemisphere was related with language function. The hypotheses of this study were that (1) the regions related with language function have a bilateral functional network and (2) the level of...
The English alphabet is employed for inputting English and other Western languages into the Computer. It is also used for many other languages such as the Chinese and Japanese for the same purpose. Since the Alphabet is a phonetic system, it can be said that all computer input is made through phonetics. Although the script system of these languages use commonly the Alphabet, the way of reading them...
Interlocutors, express not only information in the form of spoken words but also their feelings and commitments with regard to what is being said. In face-to-face communication participants interact in such a way that they react to one another's multimodal positioning in the conversation. Often this means that they take a "stance". The goal of this paper is to explore the notion of stance...
The purpose of this paper is to present our original and multidisciplinary approach to study multimodal social-emotional behaviors in children with autism spectrum disorders. Our goal is to conduct fundamental and applied research regarding the reception and production of social signals involved in human interactions. To achieve this aim, we try to understand and model cognitive and multimodal emotional...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.