wikiTAG: Wikipedia-based knowledge embeddings towards improved acoustic event classification
Qin Zhang, Qingming Tang, Chieh-Chi Kao, Ming Sun, Yang Liu, Chao Wang
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:13:23
Acoustic event classification (AEC) is the task of determining whether certain events occur in an audio clip. Inspired by pre- vious research [1, 2, 3] that embeddings from event labels can be leveraged to facilitate the learning of new detectors with no or limited audio samples, we introduce Wikipedia-based text embeddings as auxiliary information to improve AEC. We describe how to extract label embeddings from multiple Wikipedia texts, and formulate the multi-view aligned AEC problem based on VGGish model. We show that our ?wik- iTAG? embeddings encode rich semantic information and are more informative than label embeddings for AEC tasks. Compared to a supervised baseline on AudioSet, the multi- view model with ?wikiTAG? embeddings achieves 7.3% and 1.3% relative improvement in mean average precision (mAP) using 10% and full AudioSet for training, respectively. To the author?s knowledge, this is the first work in the AEC domain on building large-scale label representations by leveraging Wikipedia data in a systematic fashion.