Multi-Task Distillation: Towards Mitigating The Negative Transfer In Multi-Task Learning
Ze Meng, Xin Yao, Lifeng Sun
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:07:32
In this paper, we propose a top-down mechanism for alleviating the negative transfer in multi-task learning (MTL). MTL aims to learn the general meta-knowledge via sharing inductive bias among tasks for improving the generalization ability. However, there exists a negative transfer problem in MTL, i.e., the performance improvement of a specific task leads to performance degradation on other tasks due to task competition. As a multi-objective optimization problem, MTL usually has a trade-off between the individual performance of different tasks. Inspired by knowledge distillation that transfers knowledge from a teacher model to a student model without significant performance loss, we propose the multi-task distillation to cope with the negative transfer, turning the multi-objective problem into a multi-teacher knowledge distillation problem. Specifically, we first collect task-specific Pareto optimal teacher models and then achieve the high individual performance of each task without a trade-off in the student model by multi-teacher knowledge distillation. Moreover, the multi-task warm-up initialization and the teacher experience pool are proposed to accelerate our method. Extensive experimental results on various benchmark datasets demonstrate that our method outperforms state-of-the-art multi-task learning algorithms and the single-task training baseline.