Skip to main content

A Simple yet Effective Approach to Structured Knowledge Distillation

Wenye Lin (Tsinghua Shenzhen International Graduate School, Tsinghua University); Yangming Li (Tencent AI Lab); Lemao Liu (Tencent AI Lab); Shuming Shi (Tsinghua University); Hai-Tao Zheng (Tsinghua University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
07 Jun 2023

Structured prediction models aim at solving tasks where the output is a complex structure, rather than a single variable. Performing knowledge distillation for such problems is non-trivial due to their exponentially large output space. Previous works address this problem by developing particular distillation strategies (e.g., dynamic programming) that are both complicated and of low run-time efficiency. In this work, we propose an approach that is much simpler in its formulation, far more efficient for training than existing methods, and even performs better than our baselines. Specifically, we transfer the knowledge from a teacher model to its student by locally matching their computations on all internal structures rather than the final outputs. In this manner, we avoid time-consuming techniques like Monte Carlo Sampling for decoding output structures, permitting parallel computation and efficient training. Besides, we show that it encourages the student model to better mimic the internal behavior of the teacher model. Experiments on two structured prediction tasks demonstrate that our approach not only halves the time cost, but also outperforms previous methods on two widely adopted benchmark datasets.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00