Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 03:43:32
07 Jun 2021

Recognizing unsegmented conversational speech recorded with distant microphone(s) is a challenging but an essential task to be solved to unfold a myriad of new speech applications, such as a communication agent that can understand, respond to and facilitate our conversation. This task contains a number of subtasks, which has been studied rather independently for a decade, such as multichannel/single-channel source separation, speaker diarization with source number counting, and conversational speech recognition. This tutorial first revisits, with demonstration, current state-of-the-art systems for this task, which were developed for challenges such as CHiME 5-6 challenges, and commercial products. These systems typically consist of a combination of well-established independently optimized modules. While these systems are designed carefully to consolidate these independent modules, there is still a large room for improvement. In the latter part of the tutorial, we introduce a recent new research trend that aims to establish an optimal joint neural system that solves those subtasks all together, through end-to-end optimization based on common integrated objective. By showing the potential of such jointly-optimal systems that now start outperforming previous top-performing systems in many tasks, we discuss the future directions and challenges for this task from both industry and academic perspectives.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $25.00
    Non-members: $40.00
  • SPS
    Members: Free
    IEEE Members: $25.00
    Non-members: $40.00