Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 0:09:57
28 Jun 2022

Self-supervised pretraining methods for computer vision learn transferable representations given a large number of unlabeled images. Several methods from the field match or even outperform ImageNet weights when finetuning on downstream tasks, creating the impression that self-supervised weights are superior. We challenge this believe and show that state-of-the-art self-supervised methods match ImageNet weights either in classification or object detection but not in both. Furthermore, we demonstrate in experiments on image classification, object detection, instance segmentation and keypoint detection that using a more sophisticated supervised training protocol can greatly improve upon ImageNet weights and at least match and usually outperform state-of-the-art self-supervised methods.