Towards Unsupervised Learning Of Speech Features In The Wild

Morgane Riviere, Emmanuel Dupoux

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 0:14:42

19 Jan 2021

Recent work on unsupervised contrastive learning of speech representation has shown promising results, but so far has been tested on clean, curated speech datasets. Can it also be used with unprepared audio data 鈥渋n the wild鈥? Here, we explore three problems that may hinder unsupervised learning in the wild: (i) presence of non-speech data, (ii) noisy or low quality speech data, and (iii) imbalance in speaker distribution. We show that on the Libri-light train set, which is itself a clean speech-only dataset, these problems combined can have a performance cost of up to 30% relative for the ABX score.We show that the first two problems can be alleviated by data filtering, with voice activity detection selecting speech parts inside a file, and perplexity of a model trained with clean data helping to discard entire files. We show that the third problem can be alleviated by learning a speaker embedding in the predictive segment of the model. We show that these techniques build more robust speech features that can be transferred to an ASR task in the low resource setting.

Tags:

sps conference

slt 2021

Towards Unsupervised Learning Of Speech Features In The Wild

Morgane Riviere, Emmanuel Dupoux

Value-Added Bundle(s) Including this Product

SLT 2021 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join an IEEE Society