Pure Versus Hybrid Transformers For Multi-Modal Brain Tumor Segmentation: A Comparative Study
Gustavo andrade-Miranda, Vincent Jaouen, Vincent Bourbonne, François Lucia, Dimitris Visvikis, Pierre-Henri Conze
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:07:18
Currently, there are debates on the accuracy of vision transformers (ViTs) compared to ConvNets for image processing tasks. Image quality assessment (IQA) and particularly 360-IQA is lacking insights regarding their performances and robustness compared to the widely used ConvNets. This paper aims to investigate transfer learning from two pre-trained versions of ViTs and two ConveNets (ResNet-50 and EfficientNet-B3) for 360-degree image quality assessment with a focus on (i) the prediction accuracy and generalization ability and (ii) their adaptation to the specific characteristics of 360-degree images. Furthermore, the influence of adaptive patches sampling compared to simply using equirectangular content is analyzed with each architecture. Experimental findings on publicly available datasets (OIQA, CVIQ and MVAQD) show the superiority of ResNet-50 over ViTs and EfficientNet-B3 while requiring less computational time. Also, the base version of ViTs outperforms the larger one. Finally, except for CVIQ, both ViTs and ConveNets benefit from the adaptive sampling strategy, depicting the interest of taking 360-degree characteristics into account.