Slides: The Changing Landscape of Speech Foundation Models
Shinji Watanabe, Abdelrahman Mohamed, Karen Livescu, Hung-yi Lee, Tara Sainath, Katrin Kirchhoff, Shang-Wen Li
-
SPS
IEEE Members: $11.00
Non-members: $15.00Pages/Slides: 150
The paper "Self-Supervised Speech Representation Learning: A Review", published in 2022, focused on how representation learning transformed the landscape of speech perception models and AI applications. However, over the past two years since the article was published, there have been numerous developments in building "Foundation Models" that have blurred the boundaries between domains. Generative models have had the largest share of research innovation due to their impressive performance across many modalities and their applicability to a wider set of scenarios. In this talk, the presenters will connect their 2022 review of self-supervised approaches to the current developments in foundation perception and generative models. They will highlight active directions of research in foundation models, methods to analyze them, and their standing in comparison to other approaches across a wide range of speech applications.