OCVOS: Object-Centric Representation for Video Object Segmentation

Junho Jo, Dongyoon Wee, Nam Ik Cho

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Poster 09 Oct 2023

Semi-supervised video object segmentation (VOS) methods aim to segment target objects with the help of pixel-level annotations in the first frame. Many methods employ Transformer-based attention modules to propagate the given annotations in the first frame to the most similar patch or pixel in the following frames. Although they have shown impressive results, they can still be prone to errors in challenging scenes with multiple overlapping objects. To tackle this problem, we propose an object-centric VOS (OCVOS) method that exploits query-based Transformer decoder blocks. After aggregating target object information with typical matching-based approaches, the Transformer networks extract object-wise information by interacting with object queries. In this way, the proposed method considers not only global and contextual information but also object-centric representations. We validate its effectiveness in inducing object-wise information compared to existing methods on the DAVIS and YouTube-VOS benchmarks.

Tags:

video object segmentation

transformer

OCVOS: Object-Centric Representation for Video Object Segmentation

Junho Jo, Dongyoon Wee, Nam Ik Cho

More Like This

Slides: Devising Transformers as an Autoencoder for Unsupervised Multivariate Time Series Imputation

Devising Transformers as an Autoencoder for Unsupervised Multivariate Time Series Imputation

OPTIMIZING TRANSFORMER FOR LARGE-HOLE IMAGE INPAINTING

Join an IEEE Society