Developing Neural Representations For Robust Child-Adult Diarization
Suchitra Krishnamachari, Manoj Kumar, So Hyun Kim, Catherine Lord, Shrikanth Narayanan
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 0:14:26
Automated processing and analysis of child speech has been long acknowledged as a harder problem compared to understanding speech by adults. Speci铿乧ally, conversations between a child and adult involve spontaneous speech which often compounds idiosyncrasies associated with child speech. In this work, we improve upon the task of speaker diarization (determining who spoke when) from audio of child-adult conversations in naturalistic settings. We select conversations from the autism diagnosis and intervention domains, wherein speaker diarization forms an important step towards computational behavioral analysis in support of clinical research and decision making. We train deep speaker embeddings using publicly available child speech and adult speech corpora, unlike predominant state-of-art models which typically utilize only adult speech for speaker embedding training. We demonstrate signi铿乧ant reductions in relative diarization error rate (DER) on DIHARD II (dev) sessions containing child speech (22.88%) and two internal corpora representing interactions involving children with Autism: excerpts from ADOSMod3 sessions(33.7%) and combination of fulllength ADOS and BOSCC sessions (44.99%). Further, we validate our improvements in identifying the child speaker (typically with short speaking time) using the recall measure. Finally, we analyze the effect of fundamental frequency augmentation and the effect of child age, gender on speaker diarization performance.