Automatic Assessment of the Degree of Clinical Depression from Speech Using X-Vectors
Jos� Vicente Egas-L�pez, G�bor Kiss, Sztah� David, G�bor Gosztolya
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:11:29
Depression is a frequent and curable psychiatric disorder, detrimentally affecting daily activities, harming both workplace productivity and personal relationships. Among many other symptoms, depression is associated with disordered speech production, which might permit its automatic screening by means of the speech of the subject. However, the choice of actual features extracted from the recordings is not trivial. In this study, we employ x-vectors, a DNN-based feature extractor technique, to detect depression from a Hungarian corpus. We experiment with training custom x-vector extractors, and we also explore the performance of an out-of-domain pre-trained one. Our findings confirm that x-vectors are able to capture meaningful speaker traits that contain information for depression discrimination. We also show that the language of the extractor is of secondary importance compared to the frame-level feature set: our best model, which achieved an AUC score of 0.940 and an RMSE score of 9.54, was trained on log-energies instead of MFCCs.