Depthformer: Multiscale Vision Transformer For Monocular Depth Estimation With Global Local information Fusion
Ashutosh Agarwal, Chetan Arora
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:15:15
Facial image compression is crucial in many areas like social media and video surveillance. Considering the sparsity of facial features, sparse representation (SR) has been applied to compress facial images, in which each image patch is sparsely represented by a small number of dictionary atoms to save bit-rates. Along this line, we propose the first end-to-end sparsity-driven facial image compression network namely SFIC. in the proposed network, the traditional convolutional sparse coding (CSC) is turned into a learnable CSC block, which is combined with discrete wavelet transform (DWT) to form the sparsity encoding module (SEM). This is the first time that CSC has been explored in facial image compression. in the decoding side, a corresponding sparsity decoding module (SDM) is used to decode the image, and we further propose a quality enhancement module (QEM) to enhance the quality of decoded image. The experimental results verify that the proposed SFIC network achieves 74%, 55%, and 33% bit-rate savings over JPEG, JPEG-2000, and HEVC.