The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The main indication for auditory training is central auditory processing disorder (CAPD), which inevitably develops in patients with the chronic sensorineural hearing loss as a consequence of auditory deprivation. Patients with CAPD have difficulties with understanding complex signals, especially, speech in background noise. The aim of the study was to create the optimal algorithm of auditory training...
Deep Neural Networks (DNN) are the dominant technique widely used in English and Chinese speech recognition currently. However, Tibetan speech recognition research starts late and mainly uses Hidden Markov Model (HMM). In this paper, We show a better method of replacing Gaussian Mixture Models (GMM) by DNN to Tibetan Lhasa dialect speech recognition system. The system contains seven layers of features...
In this paper, a text independent speaker recognition system based on Gaussian mixture models (GMM) was developed with a specific focus on the use of a voice activated detector (VAD) algorithm in the training and testing. At the training level, a modified estimation/maximization (EM) algorithm is used. It is less prone to get trapped around a local maximum and so, it will have more chance to converge...
Recognition of epileptic seizures is an important issue and in certain circumstances it is desirable to have portable equipment implementing the algorithm in order to better monitor the patients. This work considers a widely used EEG database from University of Bonn as reference for comparing our recognition method with other previously reported. In order to perform epileptic seizures we combine a...
The ability to separate speech from non-stationary background disturbances using only a single channel of information has increased significantly with the adoption of deep learning techniques. In these approaches, a time-frequency mask that recovers clean speech from noisy mixtures is learned from data. Recurrent neural networks are particularly well-suited to this sequential prediction task, with...
This paper targets on a generalized vocal mode classifier (speech/singing) that works on audio data from an arbitrary data source. Previous studies on sound classification are commonly based on cross-validation using a single dataset, without considering training-recognition mismatch. In our study, two experimental setups are used: matched training-recognition condition and mismatched training-recognition...
This paper presents improvements in terms of accuracy for shape object classification using a new low complexity method compared to previous implementation [1]. The method is using echoes generated by a JAVA platform capable of emulate sound propagation in a controlled 2D virtual environment [2][3]. Echoes originate from the ultrasonic waves generated inside a virtual environment which contains geometrical...
Most sequence processing tasks can be cast as a problem of mapping a sequence of observations into a sequence of labels. This is a very difficult problem since the association between input data sequences and output label sequences is not given at the frame level. Recurrent neural networks (RNNs) equipped with connectionist temporal classification (CTC) are among the best tools devised to handle this...
The accuracy of object recognition has been greatly improved due to the rapid development of deep learning, but the deep learning generally requires a lot of training data and the training process is very slow and complex. We propose an incremental object recognition system based on deep learning techniques and speech recognition technology with high learning speed and wide applicability. The system...
Support vector machine (SVM) algorithm received much attention in the research of voiceprint recognition, especially for small sample datasets. However, with the increase of recognition number and speech features number, the rate of model training and recognition is significantly reduced. In order to solve the problem, a new weighted clustering algorithm is proposed, which use “one to one” SVM model...
Flow pattern is one of the most important parameters for gas-liquid two-phase flow. In this work, a new flow pattern identification method based on Convolution Neural Network (CNN) is presented. A 7-layer CNN structure is chosen, and the parameters of this network are determined by a training set. In order to verify the feasibility, experiments were carried out in horizontal pipe with the inner diameter...
Recently we proposed a novel multichannel end-to-end speech recognition architecture that integrates the components of multichannel speech enhancement and speech recognition into a single neural-network-based architecture and demonstrated its fundamental utility for automatic speech recognition (ASR). However, the behavior of the proposed integrated system remains insufficiently clarified. An open...
The non-negative matrix factorization (NMF) approach has shown to work reasonably well for monaural speech enhancement tasks. This paper proposes addressing two shortcomings of the original NMF approach: (1) the objective functions for the basis training and separation (Wiener filtering) are inconsistent (the basis spectra are not trained so that the separated signal becomes optimal); (2) minimizing...
The article describes the conceptual design of the instrumental shell for a pronunciation training simulator creation. The project is based on linguistic knowledge and methods of speech recognition. The discrepancies in phonetic systems of different (native and non-native) languages are taken into account. The main approaches for speech recognition are analyzed and the required components of the simulator...
This papers consists of two parts. The first is a critical review of prior art on adversarial learning, i) identifying some significant limitations of previous works, which have focused mainly on attack exploits and ii) proposing novel defenses against adversarial attacks. The second part is an experimental study considering the adversarial active learning scenario and an investigation of the efficacy...
Identification of spoken word(s) can be used to control external device. This research was result word identification in speech using Mel-Frequency Cepstrum Coefficients (MFCC) and Learning Vector Quantization (LVQ). The output of system operated the computer in certain genre song appropriate with the identified word. Identification was divided into three classes contain words such as "Klasik",...
Recent developments in deep learning methods have greatly influenced the performances of speech recognition systems. In a Hidden Markov model-Deep neural network (HMM-DNN) based speech recognition system, DNNs have been employed to model senones (context dependent states of HMM), where HMMs capture the temporal relations among senones. Due to the use of more deeper networks significant improvement...
The article presents studies on the automatic whispery speech recognition. In the performed research a new corpus with whispery speech has been used. It has been checked how is the speech recognition quality changing at variables sampling frequency and signal frame length. It has been found that the optimal sampling frequency of whispery speech is about 32–48 kHz, while the optimal signal frame length...
Monaural source separation is an important research area which can help to improve the performance of several real-world applications, such as speech recognition and assisted living systems. Huang et al. proposed deep recurrent neural networks (DRNNs) with discriminative criterion objective function to improve the performance of source separation. However, the penalty factor in the objective function...
In this paper, the development of Multilingual Phone Recognition System (MPRS) in the context of Indian languages is described. MPRS is a language independent Phone Recognition System (PRS) that could recognise the phonetic units present in a speech utterance of any language. We have developed two Bilingual and a quadrilingual PRS using four Indian languages — Kannada, Telugu, Bengali, and Odia. International...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.