Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 13:24
28 Oct 2020

Online distillation can dynamically adapt to changes of distribution in a target domain by continuously updating a smaller student model from a live video stream. However, online distillation degrades the overall accuracy because it causes overfitting to current distribution not to recent distribution. The student model is trained on sequential incoming data, and its model parameters are overwritten with the current distribution. As a result, the student model forgets the recent distribution. To overcome this problem, we propose a new training framework using cache. Our framework temporarily stores incoming frames and teacher model's outputs in a cache and trains a student model with data selected from the cache. Since our approach trains the student model with not only incoming data but also past data, it can improve the overall accuracy while adapting to changes of distribution without overfitting. To use limited cache size efficiently, we also propose a loss-aware cache algorithm that chooses training data prioritized by its loss value. Our experiments show that training with cache improves the accuracy compared with online distillation, and the loss-aware cache algorithm outperforms a cache algorithm modeled on traditional offline training.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00