Deep Metric Learning-Based Semi-Supervised Regression With Alternate Learning
Adina Zell, Gencer Sumbul, Begüm Demir
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:09:51
Transformer has shown outstanding performance in time-series data processing, which can definitely facilitate quality assessment of video sequences. However, the quadratic time and memory complexities of Transformer potentially impede its application to long video sequences. in this work, we study a mechanism of sharing attention across video clips in video quality assessment (VQA) scenario. Consequently, an efficient architecture based on integrating shared multi-head attention (MHA) into Transformer is proposed for VQA, which greatly ease the time and memory complexities. A long video sequence is first divided into individual clips. The quality features derived by an image quality model on each frame in a clip are aggregated by a shared MHA layer. The aggregated features across all clips are then fed into a global Transformer encoder for quality prediction at sequence level. The proposed model with a lightweight architecture demonstrates promising performance in no-reference VQA (NR-VQA) modelling on publicly available data-bases. The source code can be found at https://github.com/junyongyou/lagt_vqa.