Depthformer: Multiscale Vision Transformer For Monocular Depth Estimation With Global Local information Fusion

Ashutosh Agarwal, Chetan Arora

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:15:15

07 Oct 2022

Facial image compression is crucial in many areas like social media and video surveillance. Considering the sparsity of facial features, sparse representation (SR) has been applied to compress facial images, in which each image patch is sparsely represented by a small number of dictionary atoms to save bit-rates. Along this line, we propose the first end-to-end sparsity-driven facial image compression network namely SFIC. in the proposed network, the traditional convolutional sparse coding (CSC) is turned into a learnable CSC block, which is combined with discrete wavelet transform (DWT) to form the sparsity encoding module (SEM). This is the first time that CSC has been explored in facial image compression. in the decoding side, a corresponding sparsity decoding module (SDM) is used to decode the image, and we further propose a quality enhancement module (QEM) to enhance the quality of decoded image. The experimental results verify that the proposed SFIC network achieves 74%, 55%, and 33% bit-rate savings over JPEG, JPEG-2000, and HEVC.

Tags:

International Conference on Image Processing

IEEE ICIP 2022

icip

Depthformer: Multiscale Vision Transformer For Monocular Depth Estimation With Global Local information Fusion

Ashutosh Agarwal, Chetan Arora

Value-Added Bundle(s) Including this Product

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

More Like This

Spline Human Motion Recovery

Width-Wise Parameter Sharing For Multi-Domain Gan Learning

Improving Deep Metric Learning With Virtual Classes and Examples Mining

Join an IEEE Society