An Attention Model For Hypernasality Prediction In Children With Cleft Palate
Vikram C Mathad, Nancy Scherer, Kathy Chapman, Julie Liss, Visar Berisha
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:12:36
Hypernasality refers to the perception of abnormal nasal resonances in vowels and voiced consonants. Estimation of hypernasality severity from connected speech samples involves learning a mapping between the frame-level features and utterance-level clinical ratings of hypernasality. However, not all speech frames contribute equally to the perception of hypernasality. In this work, we propose an attention-based bidirectional long-short memory (BLSTM) model that directly maps the frame-level features to utterance-level ratings by focusing only on specific speech frames carrying hypernasal cues. The model’s performance is evaluated on the Americleft database containing speech samples of children with cleft palate and clinical ratings of hypernasality. We analyzed the attention weights over broad phonetic categories and found that the model yields results consistent with what is known in the speech science literature. Further, the correlation between the predicted and perceptual rating is found to be significant (r=0.684, p
Chairs:
Visar Berisha