Self-enhanced training framework for referring expression grounding
Yitao Chen, Ruoyi Du, Kongming Liang, Zhanyu Ma
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Weakly-supervised referring expression grounding (REG) aims at locating the image region described by a query sentence, where the mapping between the referential region and query is not available during the training stage. Noticing the significant gap between the fully- and weakly-supervised approaches, we develop a Self-Enhanced Training(SET) framework in this paper. Specifically, we first train the network under a weakly-supervised setting. Then, the model outputs are collected and filtered according to the confidence score and serve as pseudo-labels. Finally, with the help of these pseudo-labels, we tune the model under a fully-supervised setting. The SET framework provides a simple way of generating pseudo-labels that build a bridge between weak and full supervision. Experimental results demonstrate that model trained through our SET framework outperforms existing traditional methods on RefCOCO, RefCOCO+, and RefCOCOg datasets. The code is available at https://github.com/HTDL98/SET-framework.