Mask-VIT: An Object Mask Embedding in Vision Transformer For Fine-Grained Visual Classification
Tong Su, Shuo Ye, Chengqun Song, Jun Cheng
-
SPS
IEEE Members: $11.00
Non-members: $15.00
in video compression the luma channel can be useful for predicting chroma channels (Cb, Cr), as has been demonstrated with Cross-Component Linear Model (CCLM) used in Versatile Video Coding (VVC) standard. More recently, it has been shown that neural networks can even better capture the relationship among di?erent channels. in this paper, a new attention-based neural network is proposed for cross-component intra prediction. With the goal to simplify neural network design, the new framework consists of four branches: boundary branch and luma branch for extracting features from reference samples, attention branch for fusing the first two branches, and prediction branch for computing the predicted chroma samples. The proposed scheme is integrated into VVC test model together with one additional binary block-level syntax flag which indicates whether a given block makes use of the proposed method. Experimental results demonstrate 0.33%/1.61%/1.33% BD-rate reductions on Y/Cb/Cr components, respectively, on top of the VVC Test Model (VTM) 7.0 which uses CCLM.