The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In the storyteller speech, pauses plays a significant role in introducing suspense and climax. Pauses are used to emphasize keywords, emotion-salient words and separate the phrases in the utterance. The objective of this work is to predict the position and duration of the pauses in the synthesized speech from the text-to-speech system. We analyzed the pause patterns in storyteller speech and classified...
Voice Translator is speech to speech translation application for android mobile phone, which translates English speech to Hindi speech and vice versa. Voice Translator includes three modules, Voice Recognition, Machine Translation and Speech Synthesis. Voice Recognition module captures the voice or speech from the mobile user through speaker, identifies then converts the speech into text and then...
Speech synthesis is the most significant applications in linguistic communication process. The Text to Speech structure is the undertaking of accepts the input sentence and converts the audible speech as output. The Tamil language may be a syllable based language. A syllable is the unit of language, which may be spoken independent of the adjacent phones. It consists of an interrupted portion of sound,...
In our previous work, we have presented a cross-stream dependency modeling method for hidden Markov model (HMM) based parametric speech synthesis. In this method, multi-space probability distribution (MSD) was adopted for F0 modeling and the voicing decision error influenced the accuracy of generated spectral features severely. Therefore, a cross-stream dependency modeling method using continuous...
In order to build web-based voicefonts, an unsupervised method is needed to automate the extraction of acoustic and linguistic properties of speech. This paper addresses the impact of automatic speech transcription on statistical parametric speech synthesis based on a single speaker's 100 hour speech corpus, focusing particularly on two factors of affecting speech quality: transcript accuracy and...
Text-to-speech synthesis (TTS) is the final stage in the speech-tospeech (S2S) translation pipeline, producing an audible rendition of translated text in the target language. TTS systems typically rely on a lexicon to look up pronunciations for each word in the input text. This is problematic when the target language is dialectal Arabic, because the statistical machine translation (SMT) system usually...
A fusion scheme of phone duration models (PDMs) is presented in this work. Specifically, a support vector regression (SVR)-fusion model is fed with the predictions of a group of independent PDMs operating in parallel. The American-English KED TIMIT and the Greek WCL-1 databases are used for evaluating the PDMs and the fusion scheme. The fusion scheme contributes to the accuracy improvement over the...
In this paper, we present first results on applying a personality assessment paradigm to speech input, and comparing human and automatic performance on this task. We cue a professional speaker to produce speech using different personality profiles and encode the resulting vocal personality impressions in terms of the Big Five NEO-FFI personality traits. We then have human raters, who do not know the...
This paper proposes a combined approach to the polysemy problems in a Chinese to Taiwanese text-to-speech (TTS) system. Polysemy means there are words with more than one meaning or pronunciation. For example, there are two kinds of pronunciation for the word (he) in Taiwanese. They are /yi7/ and /yin7/. The first pronunciation, /yi7/, can mean `he' or `him'; and the other one, /yin7/, means `his'...
We address the problem in the conventional Gaussian mixture model (GMM)-based spectral conversion from the viewpoint of optimal conversion function selection. The proposed method is motivated by that if the optimal conversion function based on minimum mel-cepstral distortion (MMCD) criterion can be selected during the conversion stage, the conversion performance in terms of mel-cepstral distortion...
This study focuses on the perception of two synthesized Mandarin tones: the high level tone (Tone 1) and the high falling tone (Tone 4), which have been reported difficult for Cantonese learners of Mandarin. As the two tones are distinctive in F0 directions and also vary in F0 onsets, it is worth investigating why Cantonese listeners find them perceptually indistinguishable. We aim to find out what...
The recognition of prosodic structure is an important research aspect in the field of Text-to-Speech. It is essential to improving the naturalness of machine-synthesized speech. This paper proposes an approach to predicting and assigning prosodic structure automatically for Chinese sentences based on their tree structures. It presents the modeling of a statistical language model based on the simply...
Prosodic structure prediction plays a crucial role on the prosodic annotation of speech synthesis corpus as well as on improving the naturalness of synthesized speech. The paper studies Tibetan prosodic structure with Tibetan speech characteristics. Having analyzed a variety of variables that have an impact on Tibetan prosodic boundary, we obtain syllable boundary grammatical information, prosodic...
Speech production and speech phonetic features gradually improve in children by obtaining audio feedback after cochlear implantation or using hearing aid. In this study, voice disorders in children with cochlear implantation and hearing aid are classified. 30 Persian children participated in the study, including 6 children in levels 1 to 3 and 12 in level 4. Voice samples of 5 isolated Persian words...
The text-to-speech (TTS) synthesis technology enables machine to convert text into audible speech and used throughout the world to enhance the accessibility of the information. The important component of any TTS synthesis system is the database of sounds. In this study, three types of sound units i.e., phonemes, diphones and syllables are concatenated to produce natural sound for good quality Sindhi...
The paper presents a support vector machine based Part-Of-Speech tagging on Chinese database which is part of our speech synthesis system. The model can be classified as SVM model and uses many sequential features to predict the POS tag. The text database was download from the internet with 1,280,000 words and 33 parts of Speech. The total accuracy of our experiments is 99.31%.
In unit selection based text-to-speech (TTS) synthesis, the accurate position of the unit boundaries in the unit selection database is one of the factors that determine the quality of the synthesized speech. To ensure the accuracy of the boundary positions, developers often have to manually verify the speech boundaries that are generated by automatic speech recognition techniques. In order to reduce...
Indian languages such as Hindi is phonetic in nature. The text-to-speech (TTS) system for Hindi, exploits the phonetic nature of Hindi. The algorithm developed by us involves analysis of a sentence in terms of words and then symbols involving combination of pure consonants and vowel technique. Wave files are being merged as per the requirement to generate the modified consonants influenced by matras,...
In a Text-to-Speech system based on time-domain techniques that employ pitch-synchronous manipulation of the speech waveforms, one of the most important issues that affect the output quality is the way the analysis points of the speech signal are estimated and the actual points, i.e. the analysis pitchmarks. In this paper we present our methodology for calculating the pitchmarks of a speech waveform,...
In this study, the framework of a concatenative text-to-speech system for Turkish is built and its evaluation techniques, namely MOS, DRT and CT have been considered. Naturalness and intelligibility of the Turkish TTS system is tested by MOS and CT-DRT respectively. Although the system uses simple techniques, it provides promising results for Turkish TTS, since the selected concatenative method is...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.