-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 01:06:22
ecent months have seen a surge in discussions about the capabilities of text foundation models, particularly large language models (LLMs). Known for their general processing abilities, LLMs can effectively perform a variety of tasks with appropriate instructions. Unlike text, speech contains rich, hierarchical information, necessitating distinct capabilities for diverse applications. This raises the question: how close are we to developing speech foundation models that can understand and execute task instructions?This presentation delves into the evolution of foundation models in speech processing, highlighting three significant phases: shared encoders with task-specific heads, universal models with adaptable parameters, and task instruction models. It begins with an introduction to the Speech Processing Universal PERformance Benchmark (SUPERB), which assesses shared encoders across multiple tasks. The discussion then shifts to exploring the use of prompting in speech language models. The presentation concludes with a focus on Dynamic SUPERB, a project aimed at evaluating task instruction models in speech processing.