2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

book

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

IEEE

chapter

A novel target speaker dependent postfiltering approach for multichannel speech enhancement

Ritwik Giri, Karim Helwani, Tao Zhang

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 46 - 50

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

In this article, we present a target speaker dependent speech enhancement system, to enhance a specific target talker in presence of real life background noises. The proposed system uses a multi-channel processing stage to produce a noise reference signal. This noise reference signal is further used, to not only compute the residual noise statistics from the multichannel stage output, but also to...

chapter

Joint wideband source localization and acquisition based on a grid-shift approach

Christos Tzagkarakis, W. Bastiaan Kleijn, Jan Skoglund

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 81 - 85

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

This paper addresses the problem of joint wideband localization and acquisition of acoustic sources. The source locations as well as acquisition of the original source signals are obtained in a joint fashion by solving a sparse recovery problem. Spatial sparsity is enforced by discretizing the acoustic scene into a grid of predefined dimensions. In practice, energy leakage from the source location...

chapter

PSD estimation of multiple sound sources in a reverberant room using a spherical microphone array

Abdullah Fahim, Prasanga N. Samarasinghe, Thushara D. Abhayapala

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 76 - 80

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

We propose an efficient method to estimate source power spectral densities (PSDs) in a multi-source reverberant environment using a spherical microphone array. The proposed method utilizes the spatial correlation between the spherical harmonics (SH) coefficients of a sound field to estimate source PSDs. The use of the spatial cross-correlation of the SH coefficients allows us to employ the method...

chapter

Experimental study of robust beamforming techniques for acoustic applications

Yingke Zhao, Jesper Rindom Jensen, Mads Grasboll Christensen, Simon Doclo, more

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 86 - 90

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

In this paper, we investigate robust beamforming techniques for wideband signal processing in noisy and reverberant environments. In such environments, steering vector estimation errors are inevitable, leading to a degradation of the beamformer performance. Here, we study two types of beamformers that are robust against steering vector estimation errors. The first type includes robust Capon beamformers,...

chapter

Low-Latency approximation of bidirectional recurrent networks for speech denoising

Gordon Wichern, Alexey Lukin

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 66 - 70

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

The ability to separate speech from non-stationary background disturbances using only a single channel of information has increased significantly with the adoption of deep learning techniques. In these approaches, a time-frequency mask that recovers clean speech from noisy mixtures is learned from data. Recurrent neural networks are particularly well-suited to this sequential prediction task, with...

chapter

An em algorithm for audio source separation based on the convolutive transfer function

Xiaofei Li, Laurent Girin, Radu Horaud

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 56 - 60

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

This paper addresses the problem of audio source separation from (possibly under-determined) multichannel convolutive mixtures. We propose a separation method based on the convolutive transfer function (CTF) in the short-time Fourier transform domain. For strongly reverberant signals, the CTF is a much more appropriate model than the widely-used multiplicative transfer function approximation. An Expectation-Maximization...

chapter

Antiderivative antialiasing, lagrange interpolation and spectral flatness

Stefan Bilbao, Fabian Esqueda, Vesa Valimaki

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 141 - 145

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Aliasing is major problem in any audio signal processing chain involving nonlinearity. The usual approach to antialiasing involves operation at an oversampled rate—usually 4 to 8 times an audio sample rate. Recently, a new approach to antialiasing in the case of memoryless nonlinearities has been proposed, which relies on operations over the antiderivative of the nonlinear function, and which allows...

chapter

Low-Complexity Kalman filter for multi-channel linear-prediction-based blind speech dereverberation

Thomas Dietzen, Simon Doclo, Ann Spriet, Wouter Tirry, more

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 284 - 288

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Multi-channel linear prediction (MCLP) has been shown to be a suitable framework for tackling the problem of blind speech dereverberation. In recent years, a number of adaptive MCLP algorithms have been proposed, whereby the majority operates in the short-time Fourier transform (STFT) domain. In this paper, we focus on the STFT-based Kalman filter solution to the adaptive MCLP task. Similarly to all...

chapter

Deep recurrent NMF for speech separation by unfolding iterative thresholding

Scott Wisdom, Thomas Powers, James Pitton, Les Atlas

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 254 - 258

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

In this paper, we propose a novel recurrent neural network architecture for speech separation. This architecture is constructed by unfolding the iterations of a sequential iterative soft-thresholding algorithm (ISTA) that solves the optimization problem for sparse nonnegative matrix factorization (NMF) of spectrograms. We name this network architecture deep recurrent NMF (DR-NMF). The proposed DR-NMF...

chapter

The selection of spectral magnitude exponents for separating two sources is dominated by phase distribution not magnitude distribution

Stephen Voran

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 279 - 283

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Separating an acoustic signal into desired and undesired components is an important and well-established problem. It is commonly addressed by decomposing spectral magnitudes after exponentiation and the choice of exponent has been studied from numerous perspectives. We present this exponent selection problem as an approximation to the actual underlying geometric situation. This approach makes apparent...

chapter

Consistent anisotropic wiener filtering for audio source separation

Paul Magron, Jonathan Le Roux, Tuomas Virtanen

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 269 - 273

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

For audio source separation applications, it is common to apply a Wiener-like filtering to a time-frequency (TF) representation of the data, such as the short-time Fourier transform (STFT). This approach, in which the phase of the original mixture is assigned to each component, is limited when sources overlap in the TF domain. In this paper, we propose to improve this technique by accounting for two...

chapter

Underdetermined methods for multichannel audio enhancement with partial preservation of background sources

Ryan M. Corey, Andrew C. Singer

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 26 - 30

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Multichannel audio enhancement and source separation traditionally attempt to isolate a single source and remove all background noise. In listening enhancement applications, however, a portion of the background sources should be retained to preserve the listener's spatial awareness. We describe a time-varying spatial filter designed to apply a different gain to each sound source with minimal distortion...

chapter

Sound event detection in synthetic audio: Analysis of the dcase 2016 task results

Gregoire Lafay, Emmanouil Benetos, Mathieu Lagrange

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 11 - 15

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

As part of the 2016 public evaluation challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2016), the second task focused on evaluating sound event detection systems using synthetic mixtures of office sounds. This task, which follows the ‘Event Detection-Office Synthetic’ task of DCASE 2013, studies the behaviour of tested algorithms when facing controlled levels of audio...

chapter

Transfer learning of weakly labelled audio

Aleksandr Diment, Tuomas Virtanen

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 6 - 10

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Many machine learning tasks have been shown solvable with impressive levels of success given large amounts of training data and computational power. For the problems which lack data sufficient to achieve high performance, methods for transfer learning can be applied. These refer to performing the new task while having prior knowledge of the nature of the data, gained by first performing a different...

chapter

A convex optimization approach for time-frequency mask estimation

Feng Bao, Waleed H. Abdulla

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 31 - 35

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

In this paper, we propose a new time-frequency mask method for computational auditory scene analysis (CASA) based on convex optimization of the binary mask. In the proposed method, the pitch estimation and segment segregation in conventional CASA are completely replaced by the convex optimization of speech power. Considering the cross-correlation between the power spectra of noisy speech and noise...

chapter

Learning vocal mode classifiers from heterogeneous data sources

Zhao Shuyang, Toni Heittola, Tuomas Virtanen

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 16 - 20

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

This paper targets on a generalized vocal mode classifier (speech/singing) that works on audio data from an arbitrary data source. Previous studies on sound classification are commonly based on cross-validation using a single dataset, without considering training-recognition mismatch. In our study, two experimental setups are used: matched training-recognition condition and mismatched training-recognition...

chapter

Metric learning based data augmentation for environmental sound classification

Rui Lu, Zhiyao Duan, Changshui Zhang

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 1 - 5

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Deep neural networks have been widely applied in the field of environmental sound classification. However, due to the scarcity of carefully labeled data, their training process suffers from over-fitting. Data augmentation is a technique that alleviates this issue. It augments the training set with synthetic data that are created by modifying some parameters of the real data. However, not all kinds...

chapter

Multi-Scale multi-band densenets for audio source separation

Naoya Takahashi, Yuki Mitsufuji

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 21 - 25

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

This paper deals with the problem of audio source separation. To handle the complex and ill-posed nature of the problems of audio source separation, the current state-of-the-art approaches employ deep neural networks to obtain instrumental spectra from a mixture. In this study, we propose a novel network architecture that extends the recently developed densely connected convolutional network (DenseNet),...

chapter

Frequency domain singular value decomposition for efficient spatial audio coding

Sina Zamani, Tejaswi Nanjundaswamy, Kenneth Rose

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 126 - 130

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Advances in virtual reality have generated substantial interest in accurately reproducing and storing spatial audio in the higher order ambisonics (HOA) representation, given its rendering flexibility. Recent standardization for HOA compression adopted a framework wherein HOA data are decomposed into principal components that are then encoded by standard audio coding, i.e., frequency domain quantization...

INFONA - science communication portal

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)