Cogans For Unsupervised Visual Speech Adaptation To New Speakers

Adriana Fernandez-Lopez, Ali Karaali, Naomi Harte, Federico M. Sukno

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 12:38

04 May 2020

Audio-Visual Speech Recognition (AVSR) faces the difficult task of exploiting acoustic and visual cues simultaneously. Augmenting speech with the visual channel creates its own challenges, e.g. every person has unique mouth movements, making the generalization of visual models very difficult. This factor motivates our focus on the generalization of speaker-independent (SI) AVSR systems especially in noisy environments by exploiting the visual domain. Specifically, we are the first to explore the visual adaptation of an SI-AVSR system to an unknown and unlabelled speaker. We adapt an AVSR system trained in a source domain to decode samples in a target domain without the need for labels in the target domain. For the domain adaptation of the unknown speaker, we use Coupled Generative Adversarial Networks to automatically learn a joint distribution of multi-domain images. We evaluate our character-based AVSR system on the TCD-TIMIT dataset and obtain up to a 10% average improvement with respect to its AVSR system equivalent.

Tags:

sps conference

icassp 2020 virtual conference

May 2020

icassp 2020

Cogans For Unsupervised Visual Speech Adaptation To New Speakers

Adriana Fernandez-Lopez, Ali Karaali, Naomi Harte, Federico M. Sukno

Value-Added Bundle(s) Including this Product

ICASSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join an IEEE Society