Multi-speaker Speech Synthesis from Electromyographic Signals by Soft Speech Unit Prediction

Kevin Scheck (University of Bremen); Tanja Schultz (University of Bremen)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Electromyographic (EMG) signals of articulatory muscles reflect the speech production process even if the user is speaking silently i.e. moving the articulators without producing audible sound. We propose Speech-Unit-based EMG-to-Speech (SU-E2S), a system which relies on EMG to synthesize speech which contains the articulated content but is vocalized in another voice, determined by an acoustic reference utterance. It is based on a Voice Conversion (VC) system which decomposes acoustic speech into continuous soft speech units and a speaker embedding and then reconstructs acoustic features. SU-E2S performs speech synthesis by predicting soft speech units from EMG and using them as input to the VC system. Experiments show that the SU-E2S output is on par in terms of intelligibility of predicting acoustic features directly from EMG, but adds the functionality of synthesizing speech in other voices.

Tags:

Speech and singing voice synthesis/convertion/coding

Multi-speaker Speech Synthesis from Electromyographic Signals by Soft Speech Unit Prediction

Kevin Scheck (University of Bremen); Tanja Schultz (University of Bremen)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

DiffVoice: Text-to-Speech with Latent Diffusion

PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor

DELIVERING SPEAKING STYLE IN LOW-RESOURCE VOICE CONVERSION WITH MULTI-FACTOR CONSTRAINTS

Join an IEEE Society