Search results

chapter

Automatic scoring method considering quality and content of speech for scat Japanese speaking test

Naoko Okubo, Yuto Yamahata, Takeshi Yamada, Shingo Imai, more

2012 International Conference on Speech Database and Assessments > 72 - 77

2012 Oriental COCOSDA 2012 - International Conference on Speech Database and Assessments

We are now developing a Japanese speaking test called SCAT, which is part of J-CAT (Japanese Computerized Adaptive Test), a free online proficiency test for Japanese language learners. In this paper, we focus on the sentence-reading-aloud task and the sentence generation task in SCAT, and propose an automatic scoring method for estimating the overall score of answer speech, which is holistically determined...

chapter

Perceptual similarity between audio clips and feature selection for its measurement

Qinghua Wu, Xiaolei Zhang, Ping Lv, Ji Wu

2012 8th International Symposium on Chinese Spoken Language Processing > 387 - 391

2012 8th International Symposium on Chinese Spoken Language Processing (ISCSLP 2012)

In this paper, we explore the retrieval of perceptually similar audio. It focuses on finding sounds according to human perceptions. Thus such retrieval is more “human-centered” [1] than previous audio retrievals which intend to find homologous sounds. We make comprehensive use of various acoustic features to measure the perceptual similarity. Since some acoustic features may be redundant or even adverse...

chapter

Deriving perceptual gradation OF L2 English mispronunciations using crowdsourcing and the WorkerRank algorithm

Hao Wang, Helen Meng

2012 International Conference on Speech Database and Assessments > 145 - 150

2012 Oriental COCOSDA 2012 - International Conference on Speech Database and Assessments

Pedagogically, feedback in CAPT systems can be improved by focusing on the most critical errors rather than presenting all errors to the users at the same time. This paper presents our work on the use of crowdsourcing for collection of gradations of word-level mispronunciations in non-native English speech. Quality control procedures based on the proposed WorkerRank algorithm (adapted from well-known...

chapter

Automated robot speech gesture generation system based on dialog sentence punctuation mark extraction

Jaewoo Kim, Woo Hyun Kim, Won Hyong Lee, Ju-Hwan Seo, more

2012 IEEE/SICE International Symposium on System Integration (SII) > 645 - 647

2012 IEEE/SICE International Symposium on System Integration (SII 2012)

This paper proposed automated robot speech gesture generation system for service/entertainment robots. This system can automatically generate robot Beat gestures accompanied by speech interaction situation with humans. Beat gestures have no specific semantic meanings to communicate, but it is commonly believed that they are an essential factor of natural communication. We extracted basic gesture patterns...

chapter

Synchrony of utterance rhythms and context of repetitive dialogue in a cooperation game

Tomohito Yamamoto, Yoshihiro Miyake

2012 IEEE/SICE International Symposium on System Integration (SII) > 648 - 653

2012 IEEE/SICE International Symposium on System Integration (SII 2012)

Speech dialogue systems, such as Apple's “Siri,” have gradually become more widespread, and in the near future, a greater number of general users will have the opportunity to communicate with such systems. To facilitate this, it is necessary for the system to communicate more naturally with users, and to realize that both verbal and nonverbal information must be taken into consideration. Therefore,...

chapter

A Study on the Search of the Most Discriminative Speech Features in the Speaker Dependent Speech Emotion Recognition

Tsang-Long Pao, Chun-Hsiang Wang, Yu-Ji Li

2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming > 157 - 162

2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)

Expressing emotion to others and recognizing emotion state of the counterpart are not difficult for human. Emotion state of a person may be recognized from the facial expression, voice, and/or gesture. Speech emotion recognition research gained a lot of attention in recent years. One of the important subjects in speech emotion recognition research is the feature selection. The speech features used...

chapter

Perceptually-motivated assessment of automatically detected lexical stress in L2 learners' speech

Kun Li, Helen Meng

2012 8th International Symposium on Chinese Spoken Language Processing > 179 - 183

2012 8th International Symposium on Chinese Spoken Language Processing (ISCSLP 2012)

This paper presents a method of automatic lexical stress assessment for L2 English speech. Syllable stress can be labeled at three levels - primary (P), secondary (S) and no (N) stress, but secondary stress may vary among word pronunciations within and across accents and present difficulties for human perception. Hence, evaluation of lexical stress based on all three levels (i.e., the P-S-N criterion...

chapter

Clustering of duration patterns in speech for Text-to-Speech Synthesis

K.S. Sreelekshmi, Deepa P. Gopinath

2012 Annual IEEE India Conference (INDICON) > 1122 - 1127

2012 Annual IEEE India Conference (INDICON)

Synthesis of natural sounding speech is the greatest challenge in a Text-to-Speech Synthesis (TTS) system. In natural speech, duration, intensity and pitch are dynamically varied which is manifested as rhythm or prosody of speech. If these variations are not recreated, the synthesized speech will sound robotic. Synthesis of good quality speech depends on how well the duration and intonation patterns...

chapter

A Perceptually Motivated Multi-Band Spectral Subtraction Algorithm for Enhancement of Degraded Speech

Navneet Upadhyay, Abhijit Karmakar

2012 Third International Conference on Computer and Communication Technology > 340 - 345

2012 3rd International Conference on Computer and Communication Technology (ICCCT 2012)

The spectral subtraction method is a classical approach for enhancement of degraded speech. The basic principle of this technique is to estimate the short-time spectral magnitude of speech by subtracting estimated noise from the noisy speech spectrum and to combine it with the phase of the noisy speech. Besides reducing the noise, this method generates an unnatural and unpleasant noise, called remnant...

chapter

Multimodal Approach for Emotion Recognition Using an Algebraic Representation of Emotional States

Imen Tayari Meftah, Nhan Le Thanh, Chokri Ben Amar

2012 Eighth International Conference on Signal Image Technology and Internet Based Systems > 541 - 546

2012 Eighth International Conference on Signal-Image Technology & Internet-Based Systems (SITIS 2012)

Emotions play a key role in human-computer interaction. They are generally expressed through several ways (e.g. facial expressions, speech, body postures and gestures, etc). In this paper, we present a multimodal approach for the emotion recognition that integrates information coming from different cues and modalities. It is based on a formal multidimensional model using an algebraic representation...

chapter

Improvements to Creativity in Singing Abilities Based on Perspective of Studies on Interaction between Speech Production and Auditory Perception

Masashi Unoki, Kazushi Nishimoto

2012 Seventh International Conference on Knowledge, Information and Creativity Support Systems > 157 - 160

2012 7th International Conference on Knowledge, Information and Creativity Support Systems (KICSS)

Singing and speaking are important and natural ways in communications for humans to express nonlinguistic and linguistic information. It seems the majority of common people correctly perform and imitate all factors such as pitches and melodies as the same as those achieved by professional singers, while they can correctly vocalize all factors involved in speaking. There is no absolute answer as to...

chapter

To ask or to sense? Planning to integrate speech and sensorimotor acts

Nikolaos Mavridis, Haiwei Dong

2012 IV International Congress on Ultra Modern Telecommunications and Control Systems > 227 - 233

2012 IV International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT 2012)

For machines to converse with humans, they must at times resolve ambiguities. We are developing a conversational robot which is able to gather information about its world through sensory actions such as touch and active shifts of visual attention. The robot is also able to gain new information linguistically by asking its human partner questions. Each kind of action, sensing and speech, has associated...

chapter

Animated engineering tutors: Middle school students' preferences and rationales on multiple dimensions

Gamze Ozogul, Amy Johnson, Martin Reisslein

2012 Frontiers in Education Conference Proceedings > 1 - 6

2012 IEEE Frontiers in Education Conference (FIE)

The goal of the study was to explore middle school students' preferences for an animated engineering tutor, and investigate their rationales for their choices. 77 middle school students participated in the study, and provided their preferences and rationales on various dimensions of an animated engineering tutor such as gender, age, personality, and clothing. Results showed that for teaching engineering...

chapter

Combining laser range finders and local steered response power for audio monitoring

Jani Even, Carlos Ishi, Panikos Heracleous, Takahiro Miyashita, more

2012 IEEE/RSJ International Conference on Intelligent Robots and Systems > 986 - 991

2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2012)

This paper presents an audio monitoring system for detecting and identifying people engaged in a conversation. The proposed method is hands-free as it uses a microphone array to acquire the sound. A particularity of the approach is the use of a laser range finder based human tracker system. The human tracker monitors the locations of people then local steered response power is used to detect the people...

chapter

Learning to recognize parallel combinations of human motion primitives with linguistic descriptions using non-negative matrix factorization

Olivier Mangin, Pierre-Yves Oudeyer

2012 IEEE/RSJ International Conference on Intelligent Robots and Systems > 3268 - 3275

2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2012)

We present an approach, based on non-negative matrix factorization, for learning to recognize parallel combinations of initially unknown human motion primitives, associated with ambiguous sets of linguistic labels during training. In the training phase, the learner observes a human producing complex motions which are parallel combinations of initially unknown motion primitives. Each time the human...

chapter

A study on the consistency of human perception and machine recognition of an emotional corpus

Tsang-Long Pao, Ren-Fu Luo, Yu-Ji Li

2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC) > 3221 - 3226

2012 IEEE International Conference on Systems, Man and Cybernetics - SMC

Emotion plays a critical role in human interaction. Listeners can perceive the emotional state of speakers from their facial expression, gestures and/or speech. In this paper, we investigate the relationship of the intended emotion expressed by speakers and the emotion perceived by listeners of a newly recorded corpus. We investigate the consistency of the emotion expressed by speakers and the emotion...

chapter

Are there brain regions related to speech perception? Evidence from a functional MRI study

HoJung Kang, Dong-Youl Kim, Jong-Hwan Lee

2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC) > 1100 - 1102

2012 IEEE International Conference on Systems, Man and Cybernetics - SMC

Previous studies have demonstrated that the left hemisphere was specialized in language function using functional magnetic resonance imaging (fMRI). On the other hand, some studies have revealed that the right hemisphere was related with language function. The hypotheses of this study were that (1) the regions related with language function have a bilateral functional network and (2) the level of...

chapter

Introducing a new script system for computer communication

Bu-Yong Shin, Jinhan Kim

2012 International Conference on ICT Convergence (ICTC) > 181 - 186

2012 International Conference on ICT Convergence (ICTC)

The English alphabet is employed for inputting English and other Western languages into the Computer. It is also used for many other languages such as the Chinese and Japanese for the same purpose. Since the Alphabet is a phonetic system, it can be said that all computer input is made through phonetics. Although the script system of these languages use commonly the Alphabet, the way of reading them...

chapter

Some Suggestions for the Study of Stance in Communication

Massimo Chindamo, Jens Allwood, Elisabeth Ahlsen

2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing > 617 - 622

2012 International Conference on Privacy, Security, Risk and Trust (PASSAT)

Interlocutors, express not only information in the form of spoken words but also their feelings and commitments with regard to what is being said. In face-to-face communication participants interact in such a way that they react to one another's multimodal positioning in the conversation. Often this means that they take a "stance". The goal of this paper is to explore the notion of stance...

chapter

Exploring Multimodal Social-Emotional Behaviors in Autism Spectrum Disorders: An Interface between Social Signal Processing and Psychopathology

Laurence Chaby, Mohamed Chetouani, Monique Plaza, David Cohen

2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing > 950 - 954

2012 International Conference on Privacy, Security, Risk and Trust (PASSAT)

The purpose of this paper is to present our original and multidisciplinary approach to study multimodal social-emotional behaviors in children with autism spectrum disorders. Our goal is to conduct fundamental and applied research regarding the reception and production of social signals involved in human interactions. To achieve this aim, we try to understand and model cognitive and multimodal emotional...

INFONA - science communication portal

Search results

Automatic scoring method considering quality and content of speech for scat Japanese speaking test

Perceptual similarity between audio clips and feature selection for its measurement

Deriving perceptual gradation OF L2 English mispronunciations using crowdsourcing and the WorkerRank algorithm

Automated robot speech gesture generation system based on dialog sentence punctuation mark extraction

Synchrony of utterance rhythms and context of repetitive dialogue in a cooperation game

A Study on the Search of the Most Discriminative Speech Features in the Speaker Dependent Speech Emotion Recognition

Perceptually-motivated assessment of automatically detected lexical stress in L2 learners' speech

Clustering of duration patterns in speech for Text-to-Speech Synthesis

A Perceptually Motivated Multi-Band Spectral Subtraction Algorithm for Enhancement of Degraded Speech

Multimodal Approach for Emotion Recognition Using an Algebraic Representation of Emotional States

Improvements to Creativity in Singing Abilities Based on Perspective of Studies on Interaction between Speech Production and Auditory Perception

To ask or to sense? Planning to integrate speech and sensorimotor acts

Animated engineering tutors: Middle school students' preferences and rationales on multiple dimensions

Combining laser range finders and local steered response power for audio monitoring

Learning to recognize parallel combinations of human motion primitives with linguistic descriptions using non-negative matrix factorization

A study on the consistency of human perception and machine recognition of an emotional corpus

Are there brain regions related to speech perception? Evidence from a functional MRI study

Introducing a new script system for computer communication

Some Suggestions for the Study of Stance in Communication

Exploring Multimodal Social-Emotional Behaviors in Autism Spectrum Disorders: An Interface between Social Signal Processing and Psychopathology

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options