Bridging The Gap Between Image Coding For Machines and Humans
Nam Le, Honglei Zhang, Francesco Cricri, Ramin G. Youvalari, Hamed Rezazadegan Tavakoli, Emre Aksu, Miska Hannuksela, Esa Rahtu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:09:57
in object tracking systems, often clients capture video, encode it and transmit it to a server that performs the actual machine task. in this paper we propose an alternative architecture, where we instead transmit features to the server. Specifically, we partition the Joint Detection and Embedding (JDE) person tracking network into client and server side sub-networks and code the intermediate tensors i.e. features. The features are compressed for transmission using a Deep Neural Network (DNN) we design and train specifically for carrying out the tracking task. The DNN uses trainable non-uniform quantizers, conditional probability estimators, hierarchical coding; concepts that have been used in the past for neural networks based image and video compression. Additionally, the DNN includes a novel parameterized dual-path layer that comprises of an autoencoder in one path and a convolution layer in the other. The tensor output by each path is added before being consumed by subsequent layers. The parameter value for this dual-path layer controls the output channel count and correspondingly the bitrate of transmitted bitstream. We demonstrate that our model improves coding efficiency by 43.67% over state-of-the-art Versatile Video Coding standard that codes the source video in pixel domain.