The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents the work done in the context of the Speech2Process project for Speech Dialogue System applied in call-centers, specifically in the banking domain. In our proposed solution, the client communicates with the system by natural language sentences, which will be automatically recognized and semantically analysed. The paper describes innovative features of the selected approach, which...
This paper introduces one of the largest Romanian speech datasets freely available for both academic and commercial use. The dataset comprises speech data recorded over the last year from 12 speakers, along with 5 other speakers previously recorded in a separate environment. The data was manually segmented at utterance-level and semi-automatically labelled at phone-level. The resulting corpus amounts...
The paper aims to analyze the frequency of the posts in case of earthquakes and of the word associations included in such Social Media (SM) posts. Since important posts are shared by users in SM, the purpose was to identify the variation of a number of posts having unique content that occurred over a period of time in Social Media for a particular topic. The present study uses messages generated by...
We report the first development of a set of symptoms for a medical condition where the set of symptoms is based exclusively on information collected on the Internet. Also, we lay down a general method for doing so. Third, we introduce the first systematic set of symptoms for temporo-mandibular disorder (TMD) exclusively related to speech and suggest a set of known quantitative parameters for the analysis...
This paper provides an analysis of tweets and of their vocabulary in a specific emergency situation — earthquakes, moreover, the correlations between several words from messages and on the linear regressions between word usages and the intensity of the earthquakes. We analyzed the vocabulary used on tweets about Romanian earthquakes with the vocabulary of tweets used for other European earthquakes.
The goal of this work is to present some possible intruder detection systems and the influence of impulse-like signals upon the overall classification accuracy. Two different scenarios are used: in the first scenario five sound classes are considered (last class belong to impulsive sounds — gunshots), while in the second scenario we dropped out the impulsive sound class. More classifiers are used...
The goal of this work is to present an audio signal classification system based on Linear Predictive Coding and Random Forests. We consider the problem of multiclass classification with imbalanced datasets. The signals under classification belong to the class of sounds from wildlife intruder detection applications: birds, gunshots, chainsaws, human voice and tractors. The proposed system achieves...
This paper presents a rule-based approach for generating a large phonetic database for Romanian. The knowledge base is developed by means of the GRAALAN (Grammar Abstract Language) system. By inspecting dictionaries and corpora, we generate a phonetic database over 100,000 lemmas. Our database has a high degree of accuracy ensured by our rule-based method applied for generating phonetic transcriptions.
The field of digital audio forensics has been driving a sustained research effort in the last decade. Current digital audio authentication frameworks include Electric Network Frequency (ENF) criterion as a must. The ENF-based techniques benefit greatly from the availability of reference databases, which are built using extraction mechanisms that continuously analyze the power line signal. To find...
This paper presents the main improvements brought recently to the large-vocabulary, continuous speech recognition (LVCSR) system for Romanian language developed by the Speech and Dialogue (SpeeD) research laboratory. While the most important improvement consists in the use of DNN-based acoustic models, instead of the classic HMM-GMM approach, several other aspects are discussed in the paper: a significant...
Until recently, controlling a “smart home” consisted in setting up a series of applications and automation tools: scheduling when the air conditioning system could cool the room, turn on the lighting system at sunset, or just use ones phone to control several TV appliances or the garage door. Recent advances in speech recognition technology have made voice-controlled smart homes attainable, and many...
The paper explains the relation between prosodic phrases and Information Structure (IS) by decomposing phrases into hierarchies of embedded contrast/communicative units (CUs). At any level of hierarchies, CUs contains IS partitions supported by two contrasted functional constituents. The functional categories are defined by using a two level IS model. Topic-Focus and CU_predicate-CU_argument are the...
Nowadays, we find ourselves in an era when the education is reforming and on the other side the technology is getting better, greater and more accessible than ever [1]. The Internet of Things is already altering health care, security, utilities, transportation, and household management. The devices themselves might be small, but they bring about major changes in how we live, work, and educate our...
Recognizing emotions using natural or spontaneous speech are extremely difficult compared to doing the same for acted or elicited speeches. Speech emotion recognition for real conversation such as spontaneous speech requires linguistic information of the speech to be included in the speech emotion recognition component to achieve a high recognition rate. However, with the lack of digital speech resources...
This paper describes a study of the evolution of Romanian language, belonging to 18h and 19h centuries, from geographical domain, in order to develop an automatic recognition and interpretative transcription of Romanian historical heritage writings from Cyrillic into Latin, in printed forms. It is well known that the operation of interpretative transcription of texts written in Cyrillic is extremely...
This paper describes a data-driven approach to handling natural language interaction between humans and devices. This approach enables example-based definition and tuning of interaction scenarios. Actions and parameters can be easily configured, requiring no prior knowledge of natural language processing and no previous experience with this type of systems. The platform requires a small amount of...
The aim of this work is to provide some insights regarding the effort of building a representative and wide coverage audio base of syllables for Romanian. The audio base comprises audio recordings of syllables extracted from the following types of syllable embedding: isolated-syllable, isolated-word and continuous speech. The list of syllables has been computed over the syllabified form of single-word...
Automatic Speaker Analysis has largely focused on single aspects of a speaker such as her ID, gender, emotion, personality, or health state. This broadly ignores the interdependency of all the different states and traits impacting on the one single voice production mechanism available to a human speaker. In other words, sometimes we may sound depressed, but we simply have a flu, and hardly find the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.