Skip to main content

Self-adaptive Incremental Machine Speech Chain for Lombard TTS with High-granularity ASR Feedback in Dynamic Noise Condition

Sashi Novitasari (Nara Institute of Science and Technology); Sakriani Sakti (Japan Advanced Institute of Science and Technology); Satoshi Nakamura (Nara Institute of Science and Technology, Japan)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

A common approach for text-to-speech (TTS) in noisy conditions is offline fine-tuning, which is generally utilized on static noises and predefined conditions. We recently proposed a self-adaptive TTS in machine speech chain inference that enables TTS to control its voices in statically and dynamically noisy environments based on auditory feedback from automatic speech recognition (ASR) and speech-to-noise ratio (SNR) recognition. However, that study only investigated the system on synthetic Lombard speech data. Furthermore, the ASR feedback was at a lower granularity based only on the loss of the positive character class. In this paper, we improve the self-adaptive TTS using character-vocabulary level ASR feedback at higher granularity, considering the losses in the positive and negative classes. We focus on a self-adaptive incremental TTS (Adapt-ITTS) with a short-term feedback mechanism that aims for low latency adaptation for dynamically noisy situations. In experiments, our proposed Adapt-ITTS successfully improved intelligibility in noisy conditions based on synthetic and natural Lombard speech data on the Wall Street Journal and Hurricane datasets, respectively.