MULTI-MODAL FUSION WITH OBSERVATION POINTS FOR SKELETON ACTION RECOGNITION
Iqbal Singh, Xiaodan Zhu, Michael Greenspan
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 12:32
Current methods for skeleton-based action recognition compute features based on the given skeleton joint information. We show that introducing new observation points in skeleton motion sequences and using them to create fused representations from multiple modalities such as joints and bones, can enhance the discriminative power of the original modalities. Moreover, such representations can be used to create new streams in multi-stream networks that fuse constructively with other streams trained on the original modalities, effectively exhibiting a dual behaviour and collectively boosting the performance of the network even further. We present one possible multi-modal fusion system with a single observation point that can easily be incorporated in existing networks and improves state-of-the-art results on the two popular J-HMDB and Kinetics-Skeleton action recognition datasets.