Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:06:45
09 May 2022

Sound event detection(SED) consists of two subtasks: predicting the classes of sound events within an audio clip (audio tagging) and indicating the onset and offset times for each event (localization). One of the common approaches for SED with weak label is multiple instance learning (MIL) method. However, the general MIL method only optimizes the global loss calculated from the aggregated clip-wise predictions and weak clip labels, lacking a direct constraint on the frame-wise predictions, which leads to a large number of unreasonable prediction values. To address this issue, we explore the deterministic information that can be used to constrain the frame-wise predictions and based on which we design a frame loss with two terms. Experimental results on the DCASE2017 Task4 dataset demonstrate that the proposed loss can improve the performance of general MIL method. While this article focuses on SED applications, the proposed methods could be applied widely to MIL problems.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00