LIGHTPOSE: A LIGHTWEIGHT AND EFFICIENT MODEL WITH TRANSFORMER FOR HUMAN POSE ESTIMATION
Xiyang Liu, Peng Li, Ding Ni, Yan Wang, Hui Xue
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:06:30
The prediction of keypoints by generating high-resolution heatmaps has become a popular solution in human pose estimation. While this kind of method requires up-sampling or deconvolution operations, which would bring a great challenge to the acceleration of model inference. If performing keypoint prediction on low-resolution heatmaps, the performance is unsatisfied due to serious quantization errors. To solve this contradiction, we propose to perform joint training of the heatmap and center offset on low-resolution heatmaps to reduce quantization errors, which could achieve the comparable performance to the high-resolution heatmap and reduce the computational complexity. In addition, we utilize transformer to enhance the representation ability of low-resolution features, instead of increasing the network layers or the convolution kernel size. The transformer could bring the significant improvement of the performance with little computation cost. Combining the above two modules, we design a new lightweight pose estimation model, named LightPose. Experimental results have shown that, compared with HRNet, our method could achieve the state-of-the-art performance on COCO and MPII datasets with a massive reduction of the parameters by 86% and GFLOPs by 67%.