Vvs: Action Recognition With Virtual View Synthesis
Gao Peng, Yong-Lu Li, Hao Zhu, Jiajun Tang, Jin Xia, Cewu Lu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:05:41
Action recognition research is usually in the single-view setting. But human action is not single-view based in many cases. A lot of simple action is composed of both body movements from the third-person view, and vision guidance from the first-person view. Therefore, linking two viewpoints of data is critical for action recognition algorithms. Currently, the scale of aligned multi-view dataset is small, which limits the advancement in this direction of research. To alleviate the data limitation, we present the novel Virtual View Synthesis (VVS) module. Instead of training and testing on small scale multi-view data, VVS is first pre-trained on multi-view data to generalize the multi-view ''supervisory attention''. Then it is incorporated into single-view action recognition model to transfer the ability of how to better observe the existing view based on experience from another view. Extensive experiments demonstrate that VVS can improve strong baselines on several single-view action recognition benchmarks.