SCENE TEXT SEGMENTATION BY PAIRED DATA SYNTHESIS
Quang-Vinh Dang, Guee-Sang Lee
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Scene text segmentation task has numerous practical applications. However, the number of images in the available datasets for scene text segmentation is too small to effectively train deep learning-based models, leading to limited performance. To solve this problem, we perform the segmentation in two aspects: paired data synthesis and methodology. The former is executed via the proposed Text Image-conditional GANs to generate realistic paired data. We exploit real-world images by self-supervised pre-training scheme via inpainting approach before training the proposed GANs to produce realistic synthetic data. The latter is carried out by the proposed scene text segmentation network to optimize learning the generated paired data, called Multi-task Cascade Transformer. It includes two auxiliary tasks and one main task for text segmentation. The functions of the two auxiliary tasks are to learn the text region to focus on, together with learning the structure of text through their fonts, and then they support the main task. We implement three publicly available datasets for scene text segmentation: ICDAR13 FST, Total Text, and TextSeg datasets to demonstrate the effectiveness of our method. Our experimental result outperforms existing methods.