The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper describes an HMM-based speech synthesis that allows dimensional description of emotion as inputs. A spontaneous dialogue speech corpus that was designed for studying paralinguistic phenomena in expressive social interactions was used to train the models, utilizing its emotional state description as additional contextual factors. In the perceptual experiment, a very high correlation was...
We are now developing a Japanese speaking test called SCAT, which is part of J-CAT (Japanese Computerized Adaptive Test), a free online proficiency test for Japanese language learners. In this paper, we focus on the sentence-reading-aloud task and the sentence generation task in SCAT, and propose an automatic scoring method for estimating the overall score of answer speech, which is holistically determined...
A learner corpus is a very important resource for research on second language acquisition. In this paper, the design and development of a Chinese learner corpus is described. The corpus is based on compositions by learners from 59 countries. The procedure of the development of the corpus is reported, including coping, proofing, etc. Information such as type of writing, student number, time of writing,...
The present paper conducts a pioneering exploration on the phonetic manifestation and of pronominal anaphora and influential factors in Chinese reading texts, taking the third personal pronoun “ta” as example. The F0 and duration of “ta” of varied types are compared; also, the stress degrees of “ta” and its surrounding syllables are examined. The results demonstrate that: i) syntactic position plays...
The paper is an acoustic analysis of the formant frequencies and durations of the two sets of Yongding Hakka vowels, [i a i] in CV syllables and [i? a? u? ε? c?] in CVS syllables, from 10 male and 10 female speakers. The formant data show that the Yongding Hakka [i] has the F-pattern of a mid central, rather than a high central, vowel, and the vowel [i?] has the F-pattern of [I?]. The durations of...
An approach of speech corpus script design for a s customized TTS engine that is applied on railway passenger service information broadcasting is presented in this paper. Raw text material is collected according to railway service information classification. A modified greedy algorithm is proposed to generate an optimal corpus script based on the statistics of prosodic nature of the raw corpus. A...
We report the development of a Malay conversational speech corpus as part of our research in spontaneous conversational speech LVCSR. This corpus development effort is the collaboration between NTU and USM. The goal is to collect, transcribe, and annotate 50 hours of conversational Malay speech. The conversation is recorded from both close-talk and telephone channels, and both speakers' utterances...
A network-based multilingual speech translation service under the Universal Speech Translation Advanced Research (U-STAR) consortium requires a well-tuned Thai automatic speech recognition (ASR) service. This paper summarizes the development of the service by utilizing both Thai read-speech and telephone speech (LOTUS-CELL 2.0) corpora. Tuning is performed regarding different sets of acoustic unit...
Language is used for communication and communication facilitates social activities. If we want to capture this, linguistic investigation has to be carried out within a wider context. Examination of linguistic communication in a wider context shows that it is multimodal. In order to study naturalistic multimodal communication using a corpus, the corpus should contain a combination of recordings, documentation,...
Fifty Japanese speakers' read speech data of “the North Wind and the Sun” from the Japanese AESOP corpus was analyzed by automatic alignment using the HTK tool with a modified TIMIT dictionary. The results showed typical phonetic and phonological problems of English pronunciation by Japanese speakers which have been often discussed in EFL. The Japanese subjects' English fluency was evaluated by 8...
Letter-to-Sound(LTS) conversion, which is used to compress the lexicon for embedded application purpose, has become an important part in Text-to-Speech (TTS) system. In this paper, coupled Hidden Markov Models (CHMM) for LTS conversion is proposed. In the phase of preprocessing, many-to-many alignment is adopted for lexicon alignment instead of one-to-one alignment which is commonly used in previous...
Concatenative speech synthesis (CSS) provides the greatest naturalness. However, it requires a huge stored database resulting a huge footprint. Reducing the capacity of stored database while preserving the quality of CSS, or improving the quality to size ratio (QSr), is still a challenge. In this paper, we propose a method of transforming fundamental frequency (F0) contours of lexical tones, developed...
In this paper, we introduce a bimodal speech recognition corpus in real environments. In recent years, speech recognition technology has been used in noisy conditions. Therefore, it becomes necessary to achieve higher recognition accuracy in real environments. As one of the solutions, bimodal speech recognition using audio and non-audio information is getting studied. However, there are few databases...
This study attempts to investigate the grapheme-to-phoneme conversion approaches for minority language conditions. Instead of isolated-word data for major languages, sentence-form data is defined to be a proper form of training data for minority languages. Joint-multigram Model and Hidden Markov Model were examined in this study. The “treat-sentence-as-word” training method and the forced-alignment...
This study first examines the differences in the gross features of the fundamental frequency contour (the F0 contour) responsible for discriminating utterances of three sentence types, namely declarative, imperative and interrogative, in Bangla. In order to realize these differences in speech synthesis, these differences are then interpreted in terms of differences in the parameters of the command-response...
Multilingualism in Indonesia gradually faces a state of catastrophe. Although several projects have been initiated for cultural preservation, the available technology that could support communication between elders and younger people within indigenous communities, as well as with people outside the community, is still very rare in Indonesia. This paper presents the first step of long-term development...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.