Skip to main content

Pitch Estimation Via Self-Supervision

Beat Gfeller, Christian Frank, Dominik Roblek, Matt Sharifi, Marco Tagliasacchi, Mihajlo Velimirovic

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 12:06
04 May 2020

We present a method to estimate the fundamental frequency in monophonic audio, often referred to as pitch estimation. In contrast to existing methods, our neural network can be fully trained only on unlabeled data, using self-supervision. A tiny amount of labeled data is needed solely for mapping the network outputs to absolute pitch values. The key to this is the observation that if one creates two examples from one original audio clip by pitch shifting both, the difference between the correct outputs is known, without even knowing the actual pitch value in the original clip. Somewhat surprisingly, this idea combined with an auxiliary reconstruction loss allows training a pitch estimation model. Our results show that our pitch estimation method obtains an accuracy comparable to fully supervised models on monophonic audio, without the need for large labeled datasets. In addition, we are able to train a voicing detection output in the same model, again without using any labels.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00