From U-Net To Transformers: Navigating Through Key Advances In Medical Image Segmentation: Part 1
Vishal Patel, Jeya Maria Jose Valanarasu
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 02:19:38
Medical image segmentation plays a pivotal role in computer-aided diagnosis systems which are helpful in making clinical decisions. Segmenting a region of interest like an organ or lesion from a medical image or a scan is critical as it contains details like the volume, shape and location of the region of interest. Recently, the state of the art methods for medical image segmentation for most modalities like magnetic resonance imaging (MRI), computed tomography (CT) and ultrasound (US) are based on deep learning. These deep learning based methods proposed for medical image segmentation help in aiding radiologists for making fast and labor-less annotations. In this Tutorial, we will go through the key advances in both convolution networks till transformers and understand why and how these advances have impacted medical image segmentation. CNN Based Methods: The introduction of U-Net in 2015 caused a revolution in medical image segmentation as it surpassed the previous segmentation methods by a large margin and was easy to train for specific tasks. U-Net used a encoder-decoder based architecture using convolutional neural networks that takes in a 2D image as input and outputs the segmentation map. Later, 3D U-Net was proposed for volumetric segmentation. Following that, a lot of methods were proposed improving the key architecture of U-Net/3D U-Net. U-Net++ was proposed using nested and dense skip connection for further reducing the semantic gap between the feature maps of the encoder and decoder. UNet3+ proposed using full-scale skip connections where skip connections are made between different scales. V-Net proposes processing the input volumes slice-wise and uses volumetric convolutions instead. KiU-Net combines feature maps of both under-complete and overcomplete deep networks such that the network learns to segment both small and large segmentation masks effectively. nnU-Net shows how just tuning U-Net properly can achieve a good performance. Transformer Based Methods: TransUNET proposed a methodology for multi-organ segmentation by using a transformer as an additional layer in the bottleneck of a U-Net architecture. It encodes tokenized image patches from a convolution neural network (CNN) feature map as the input sequence for extracting global contexts. Medical Transformer introduces a transformer-based gated axial attention mechanism for 2D medical image segmentation to train transformers in the low data regime. UNETR introduces a transformer based method for 3D volumetric segmentation. Multi-Compound Transformer (MCTrans) incorporates rich feature learning and semantic structure mining into a unified framework embedding the multi-scale convolutional features as a sequence of tokens, and performing intra- and inter-scale self-attention, rather than single-scale attention in previous works. Swin TransUNet uses a shifted window and window attention to extract hierarchical features from the input image.