REAL-M: TOWARDS SPEECH SEPARATION ON REAL MIXTURES

Cem Subakan, Mirco Ravanelli, Samuele Cornell, Francois Grondin

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:15:14

09 May 2022

In recent years, deep learning based source separation has achieved impressive results. Most studies, however, still evaluate separation models on synthetic datasets, while the performance of state-of-the-art techniques on in-the-wild speech data remains an open question. This paper contributes to fill this gap in two ways. First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures. Secondly, we address the problem of performance evaluation of real-life mixtures, where the ground truth is not available. We bypass this issue by carefully designing a blind Scale-Invariant Signal-to-Noise Ratio (SI-SNR) neural estimator. Through a user study, we show that our estimator reliably evaluates the separation performance on real mixtures, i.e. we observe that the performance predictions of the SI-SNR estimator correlate well with human opinions. Moreover, when evaluating popular speech separation models, we observe that the performance trends predicted by our estimator on the REAL-M dataset closely follow the performance trends achieved on synthetic benchmarks.

Tags:

dataset

blind si-snr estimation

deep learning

source separation

in-the-wild speech separation

REAL-M: TOWARDS SPEECH SEPARATION ON REAL MIXTURES

Cem Subakan, Mirco Ravanelli, Samuele Cornell, Francois Grondin

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Invertible Neural Networks and their Applications

Slides: Invertible Neural Networks and their Applications

KEYNOTE: Keras, A shortcut to master AI

Join an IEEE Society