SCRATCHHOI: TRAINING HUMAN-OBJECT INTERACTION DETECTORS FROM SCRATCH
Jun Yi Lim, Vishnu Monn Baskaran, Joanne Mun Yee Lim, Ricky Sutopo, Kok Sheik Wong, Massimo Tistarelli
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Transformer-based approaches have exhibited outstanding performances in the field of human-object interaction (HOI) detection. However, these approaches rely on underlying object detectors that have undergone large-scale pre-trainings on the ImageNet and MS-COCO dataset. This limits the potential of unique architectural designs and induces a learning bias, causing ineffective HOI representation learning. In this paper, we propose ScratchHOI, a transformer-based method for human-object interaction detection that can be trained from scratch, eliminating the need for pre-trained object detectors. ScratchHOI employs dynamic and static affinity-based feature aggregation for processing local and long-range visual information. Additional techniques are also employed to improve detection performance, such as dynamic and interactive anchor refinement for objects and interactions. Experiments on the HICO-Det dataset show that ScratchHOI achieves competitive performance against other state-of-the-art approaches over a variety of different evaluation measures.