Guiding audio source separation by video object information

Sanjeel Parekh; Slim Essid; Alexey Ozerov; Ngoc Q. K. Duong; Patrick Perez; Gael Richard

doi:10.1109/WASPAA.2017.8169995

Guiding audio source separation by video object information

Parekh, Sanjeel, Essid, Slim, Ozerov, Alexey, Duong, Ngoc Q. K., Perez, Patrick, Richard, Gael

Source

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 61 - 65

Abstract

In this work we propose novel joint and sequential multimodal approaches for the task of single channel audio source separation in videos. This is done within the popular non-negative matrix factorization framework using information about the sounding object's motion. Specifically, we present methods that utilize non-negative least squares formulation to couple motion and audio information. The proposed techniques generalize recent work carried out on NMF-based motion-informed source separation and easily extend to video data. Experiments with two distinct multimodal datasets of string instrument performance recordings illustrate their advantages over the existing methods.