Contextual Label Transformation For Scene Graph Generation
Wonhee Lee, Sungeun Kim, Gunhee Kim
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:07:50
For scene graph generation, it is crucial to properly understand the relationships of objects within the context of the image. We design a label transformation method using a Transformer-VAE (Variational Autoencoder) structure, which converts bounding box labels into auxiliary labels that contain each object's context in an unsupervised manner. The auxiliary labels are then trained jointly with bounding box labels and relation labels in a multi-task way. Our approach does not require any external datasets or language prior and is applicable to any graph generation models that infer the relationship between pairs of objects. We validate our method's effectiveness and scalability with state-of-the-art scene graph generation models on VRD and VG datasets.