The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Non-negative spectrogram decomposition and its variants have been extensively investigated for speech enhancement due to their efficiency in extracting perceptually meaningful components from mixtures. Usually, these approaches are implemented on the condition that training samples for one or more sources are available beforehand. However, in many real-world scenarios, it is always impossible for...
Deep neural network(DNN) has achieved a great success in automatic speech recognition(ASR), and it can be regarded as a joint model combining the nonlinear feature transformation and the log-linear classifier. Recently DNN is adopted as a regression model to enhance the distorted feature in noisy condition and the enhanced feature is utilized to improve the performance of DNN based ASR. Previous work...
This paper deals with active noise control (ANC) for impulsive noise sources for which the filtered-x least mean square (FxLMS) algorithm becomes unstable. By minimizing the fractional lower order moment, the resulting filtered-x least mean p-power (FxLMP) algorithm has an update vector being computed using sign operator and fractional power of the residual error signal. This results in improved robustness...
Generally, in multi-lingual communities, non-native speakers may produce speech sound which is either part of their own native language or established via merging characteristics of native pronunciation with non-native pronunciation. Recently, a Two-pass phone clustering based on Confusion Matrix (TCM) approach has been proposed to address the one-to-one phone mappings between Chinese syllables and...
In HMM/DNN automatic speech recognition (ASR) systems, the DNNs model the posterior probabilities for triphone states. However, triphone states are unevenly distributed. In this situation, the training algorithm tends to converge to a local optimum more related to states with rich data than states with poor data. Thus, the imbalance of the training data decreases the ASR performances, especially for...
In low resource Automatic Speech Recognition (ASR), one usually resorts to the Statistical Machine Translation (SMT) technique to learn transform rules to refine grapheme lexicon. To do this, we face two challenges. One is to generate grapheme sequences from the training data as the targets, which is paired with the original transcripts to train SMT models; the other is to effectively prune the learned...
In recent years, deep neural network(DNN) has achieved great success when used as acoustic model in speech recognition. An important application of DNN is to derive bottleneck feature. In this paper, firstly we investigate the robustness of bottleneck features generated by three types of DNN structures on the Aurora 4 task without any explicit noise compensation. Secondly, we propose the node-pruning...
The performance of speaker verification system (SVS) declines dramatically in noisy environments. To suppress the adverse impact of the noise on SVS, this paper investigates employing the nonnegative matrix factorization (NMF) technique to reconstruct the speech based on the pre-trained speech basis matrix (SBM) and noise basis matrix (NBM). The contribution of this research lies in utilizing the...
This paper proposes a semantic music discovery system based on a tag-level factor graph (TFG) model with utilization of tag probability and content similarity in a unified fashion. The content similarities are calculated based on the extracted pitch features while tag probabilities are obtained from our previous auto-tagging system. The TFG model consists of a set of node and edge feature functions,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.