The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
For this year's 36th edition of ICASSP, we received 2946 submissions, which is probably an all-time high: it represents an increase of 5% over last year and 12% over two years ago. The overall acceptance rate was 49%. Distributed over the various technical areas, as covered by the Signal Processing Society Technical Committees (TCs), the submission statistics are as follows:
The organizing committee of ICASSP 2011 is delighted to welcome you to the 36th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), which is being held at the Prague Congress Centre, May 22–27, 2011. This is the flagship conference for the IEEE Signal Processing Society. In 1997, ICASSP was held in Munich, Germany, and now, in 2011, ICASSP is back in Central Europe....
Modern monaural voice and accompaniment separation systems usually consist of two main modules: melody extraction and time-frequency masking. A main distinction between different separation systems lies in what approaches are used for the two modules. Popular techniques for melody extraction include hidden Markov models (HMMs) and non-negative matrix factorization (NMF), and masking includes hard...
This paper concerns the adaptation of spectrum dictionaries in audio source separation with supervised learning. Supposing that samples of the audio sources to separate are available, a filter adaptation in the frequency domain is proposed in the context of Non-Negative Matrix Factorization with the Itakura-Saito divergence. The algorithm is able to retrieve the acoustical filter applied to the sources...
We consider the task of under-determined reverberant audio source separation. We model the contribution of each source to all mixture channels in the time-frequency domain as a zero-mean Gaussian random vector with full-rank spatial covariance matrix. We introduce an inverse Wishart prior over the covariance matrices, whose mean is given by the theory of statistical room acoustics and whose variance...
This paper presents a novel method for solving the permutation problem inherent to frequency domain blind signal separation of multiple simultaneous speakers. As conventional methods, the proposed method exploits the direction of arrival (DOA) of the different speakers to resolve the permutation. But it is designed to exploit the information from pairs of microphones that are usually discarded because...
We present a semi-supervised source separation methodology to denoise speech by modeling speech as one source and noise as the other source. We model speech using the recently proposed non-negative hidden Markov model, which uses multiple non-negative dictionaries and a Markov chain to jointly model spectral structure and temporal dynamics of speech. We perform separation of the speech and noise using...
We propose an unsupervised inference procedure for audio source separation. Components in nonnegative matrix factorization (NMF) are grouped automatically in audio sources via a penalized maximum likelihood approach. The penalty term we introduce favors sparsity at the group level, and is motivated by the assumption that the local amplitude of the sources are independent. Our algorithm extends multiplicative...
Multipitch estimation techniques are widely used for music transcription and acquisition of musical data from digital signals. In this paper, we propose a flexible harmonic temporal timbre model to decompose the spectral energy of the signal in the time-frequency domain into individual pitched notes. Each note is modeled with a 2-dimensional Gaussian mixture. Unlike previous approaches, the proposed...
Modern music production often uses pre-recorded pieces of audio, so-called samples, taken from a huge sample database. Consequently, there is an increasing demand to extensively adapt these samples to their intended new musical environment in a flexible way. Such an application, for instance, retroactively changes the key mode of audio recordings, e.g. from a major key to minor key by a frequency...
We propose a new approach for automatic melody extraction from polyphonic audio, based on Probabilistic Latent Component Analysis (PLCA).An audio signal is first divided into vocal and non-vocal segments using a trained Gaussian Mixture Model (GMM) classifier. A statistical model of the non-vocal segments of the signal is then learned adaptively from this particular input music by PLCA. This model...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.