Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features
Ryandhimas E. Zezario
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:55:28
Speech assessment metrics are indicators that quantitatively measure specific attributes of speech signals, and they are vital for developing speech-related application systems. The emergence of deep learning models and the need for non-intrusive methods that can accurately evaluate speech quality or intelligibility without requiring ground-truth labels have led to the development of many deep learning-based speech assessment models. In this webinar, we will first discuss the general ideas of speech assessment metrics, including introducing conventional signal processing-based approaches. Next, we will introduce the general concept of deploying deep learning-based speech assessment models, including current existing strategies, important aspects, and challenges of model deployment. We will then introduce our approach, MOSA-Net, a deep learning-based non-intrusive Multi-Objective Speech Assessment model with cross-domain features. This model can simultaneously estimate speech quality, intelligibility, and distortion assessment scores of an input speech signal. Lastly, we will introduce the direct integration of the speech assessment model for robust speech enhancement (SE) performance, where we adopt the latent representations of MOSA-Net to guide the SE process and derive a quality-intelligibility (QI)-aware SE (QIA-SE) approach.