PARTS BASED ATTENTION FOR HIGHLY OCCLUDED PEDESTRIAN DETECTION WITH TRANSFORMERS
K.N Ajay Shastry, Jayesh Chaudhari, Daksh Thapar, Aditya Nigam, Chetan Arora
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Despite the significant progress made in pedestrian detection in last decade, detecting pedestrians under heavy occlusion still remains a challenging problem. In state of the art (SOTA), convolutional neural network (CNN) based models, the reason is attributed to non-maximal-suppression (NMS), which often erroneously deletes true positives when one pedestrian is occluding other. SOTA transformer based models do not have such NMS step, yet fail to detect highly occluded pedestrians. In this paper, we study the reasons for such failures. We observe that such models first predict key-points, and then compute the attention at the specific key-points. Our analysis reveals that the key-points do not have any preference towards semantically important body parts. Under heavy occlusion, such key-points end up attending to non-discriminative regions or background, leading to false negatives. We take inspiration from the conventional wisdom of detecting objects using their parts, and bias the attention of proposed transformer architecture towards semantically important, and highly discriminative human body parts. The intervention leads to SOTA results on benchmark Citypersons and Caltech datasets, achieving 30.75%, and 32.96% miss-rate (lower is better) respectively, against 32.6%, and 38.2% by the current SOTA.