Mitigating Dataset Bias in Image Captioning through CLIP Confounder-free Captioning Network

YeonJu Kim, Junho Kim, Byung-Kwan Lee, Sebin Shin, Yong Man Ro

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Poster 10 Oct 2023

The dataset bias has been identified as a major challenge in image captioning. When the image captioning model predicts a word, it should consider the visual evidence associated with the word, but the model tends to use contextual evidence from the dataset bias and results in biased captions, especially when the dataset is biased toward some specific situations. To solve this problem, we approach from the causal inference perspective and design a causal graph. Based on the causal graph, we propose a novel method named C2Cap which is CLIP confounder-free captioning network. We use the global visual confounder to control the confounding factors in the image and train the model to produce debiased captions. We validate our proposed method on MSCOCO benchmark and demonstrate the effectiveness of our method.

Tags:

image captioning

Causal inference

Dataset bias

Global visual confounder

clip

Mitigating Dataset Bias in Image Captioning through CLIP Confounder-free Captioning Network

YeonJu Kim, Junho Kim, Byung-Kwan Lee, Sebin Shin, Yong Man Ro

More Like This

SEM-CS: SEMANTIC CLIPSTYLER FOR TEXT-BASED IMAGE STYLE TRANSFER

CLIP-FG:SELECTING DISCRIMINATIVE IMAGE PATCHES BY CONTRASTIVE LANGUAGE-IMAGE PRE-TRAINING FOR FINE-GRAINED IMAGE CLASSIFICATION

Zero-shot Human-Object Interaction (HOI) Classification by Bridging Generative and Contrastive Image-Language Models

Join an IEEE Society