» Articles » PMID: 36904589

Vision Transformers in Image Restoration: A Survey

Overview
Journal Sensors (Basel)
Publisher MDPI
Specialty Biotechnology
Date 2023 Mar 11
PMID 36904589
Authors
Affiliations
Soon will be listed here.
Abstract

The Vision Transformer (ViT) architecture has been remarkably successful in image restoration. For a while, Convolutional Neural Networks (CNN) predominated in most computer vision tasks. Now, both CNN and ViT are efficient approaches that demonstrate powerful capabilities to restore a better version of an image given in a low-quality format. In this study, the efficiency of ViT in image restoration is studied extensively. The ViT architectures are classified for every task of image restoration. Seven image restoration tasks are considered: Image Super-Resolution, Image Denoising, General Image Enhancement, JPEG Compression Artifact Reduction, Image Deblurring, Removing Adverse Weather Conditions, and Image Dehazing. The outcomes, the advantages, the limitations, and the possible areas for future research are detailed. Overall, it is noted that incorporating ViT in the new architectures for image restoration is becoming a rule. This is due to some advantages compared to CNN, such as better efficiency, especially when more data are fed to the network, robustness in feature extraction, and a better feature learning approach that sees better the variances and characteristics of the input. Nevertheless, some drawbacks exist, such as the need for more data to show the benefits of ViT over CNN, the increased computational cost due to the complexity of the self-attention block, a more challenging training process, and the lack of interpretability. These drawbacks represent the future research direction that should be targeted to increase the efficiency of ViT in the image restoration domain.

Citing Articles

MIMO-Uformer: A Transformer-Based Image Deblurring Network for Vehicle Surveillance Scenarios.

Zhang J, Cheng B, Zhang T, Zhao Y, Fu T, Wu Z J Imaging. 2024; 10(11).

PMID: 39590738 PMC: 11596006. DOI: 10.3390/jimaging10110274.


Non-small cell lung cancer detection through knowledge distillation approach with teaching assistant.

Pavel M, Islam R, Babor S, Mehadi R, Khan R PLoS One. 2024; 19(11):e0306441.

PMID: 39504338 PMC: 11540227. DOI: 10.1371/journal.pone.0306441.


A joint learning framework for multisite CBCT-to-CT translation using a hybrid CNN-transformer synthesizer and a registration network.

Hu Y, Cheng M, Wei H, Liang Z Front Oncol. 2024; 14:1440944.

PMID: 39175474 PMC: 11338897. DOI: 10.3389/fonc.2024.1440944.


Multi-Branch Network for Color Image Denoising Using Dilated Convolution and Attention Mechanisms.

Duong M, Nguyen Thi B, Lee S, Hong M Sensors (Basel). 2024; 24(11).

PMID: 38894398 PMC: 11175289. DOI: 10.3390/s24113608.


GNViT- An enhanced image-based groundnut pest classification using Vision Transformer (ViT) model.

P V, M I PLoS One. 2024; 19(3):e0301174.

PMID: 38527074 PMC: 10962840. DOI: 10.1371/journal.pone.0301174.


References
1.
Shamshad F, Khan S, Zamir S, Khan M, Hayat M, Khan F . Transformers in medical imaging: A survey. Med Image Anal. 2023; 88:102802. DOI: 10.1016/j.media.2023.102802. View

2.
Zhang K, Zuo W, Zhang L . FFDNet: Toward a Fast and Flexible Solution for CNN based Image Denoising. IEEE Trans Image Process. 2018; . DOI: 10.1109/TIP.2018.2839891. View

3.
Zhang K, Li R, Yu Y, Luo W, Li C . Deep Dense Multi-Scale Network for Snow Removal Using Semantic and Depth Priors. IEEE Trans Image Process. 2021; 30:7419-7431. DOI: 10.1109/TIP.2021.3104166. View

4.
Song Y, He Z, Qian H, Du X . Vision Transformers for Single Image Dehazing. IEEE Trans Image Process. 2023; PP. DOI: 10.1109/TIP.2023.3256763. View

5.
Saad M, Bovik A, Charrier C . Blind image quality assessment: a natural scene statistics approach in the DCT domain. IEEE Trans Image Process. 2012; 21(8):3339-52. DOI: 10.1109/TIP.2012.2191563. View