SPATIAL-TEMPORAL TRANSFORMER NETWORK FOR HUMAN MOCAP DATA RECOVERY
Jijin Zhang, Jingliang Peng, Na Lv
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Human Motion Capture (MoCap) has emerged as the most popular method for human animation production. However, due to joint occlusion, marker shedding, and equipment imprecision, the raw motion data is often corrupted, leading to missing motion data. To address this issue, a missing motion data recovery method utilizing attention-based transformers is proposed in this paper. The proposed model consists of two levels of transformers and a regression head. The first level of transformers extract the spatial features within each frame, and the second level of transformer integrates the per-frame features across time to capture temporal dependencies. The integrated features are then sent to the regression head to derive the complete motion. Extensive experiments on the CMU database demonstrate that the proposed model consistently outperforms the other state-of-the-art methods in recovery accuracy.