EFFICIENT PER-SHOT TRANSFORMER-BASED BITRATE LADDER PREDICTION FOR ADAPTIVE VIDEO STREAMING

Ahmed Telili, Wassim Hamidouche, Sid Ahmed Fezza, Luce Morin

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Poster 09 Oct 2023

Recently, HTTP adaptive streaming (HAS) has become a standard approach for over-the-top (OTT)-based video streaming services due to its ability to provide smooth streaming. In HAS, stream representations are encoded to target a specific bitrate providing a wide range of operating bitrates known as the bitrate ladder. In the past, a fixed bitrate ladder approach for all videos has been widely used. However, such a method does not consider video content, which can vary considerably in motion, texture, and scene complexity. Moreover, building a per-title bitrate ladder based on an exhaustive encoding is quite expensive due to the large encoding parameter space. Thus, alternative solutions allowing accurate and efficient per-title bitrate ladder prediction are in great demand. Furthermore, self-attention-based architectures have achieved tremendous performance in large language models (LLMs) and particularly Vision Transformers (ViTs) in computer vision tasks. Therefore, this paper investigates ViT’s capabilities in building an efficient bitrate ladder without performing any encoding process. We provide the first in-depth analysis of the prediction accuracy and the complexity overhead induced by the ViTs model in predicting the bitrate ladder on a large scale video dataset. The source code of the proposed solution and the dataset will be made publicly available.

Tags:

bitrate ladder

video compression

hevc

vision transformer

adaptive video streaming

EFFICIENT PER-SHOT TRANSFORMER-BASED BITRATE LADDER PREDICTION FOR ADAPTIVE VIDEO STREAMING

Ahmed Telili, Wassim Hamidouche, Sid Ahmed Fezza, Luce Morin

More Like This

CLIP-FG:SELECTING DISCRIMINATIVE IMAGE PATCHES BY CONTRASTIVE LANGUAGE-IMAGE PRE-TRAINING FOR FINE-GRAINED IMAGE CLASSIFICATION

MULTIPLE DESCRIPTION VIDEO CODING FOR REAL-TIME APPLICATIONS USING HEVC

PYRAMID TRANSFORMER DRIVEN MULTIBRANCH FUSION FOR POLYP SEGMENTATION IN COLONOSCOPIC VIDEO IMAGES

Join an IEEE Society