» Articles » PMID: 35540957

A Multimodal Transformer to Fuse Images and Metadata for Skin Disease Classification

Overview
Journal Vis Comput
Date 2022 May 11
PMID 35540957
Authors
Affiliations
Soon will be listed here.
Abstract

Skin disease cases are rising in prevalence, and the diagnosis of skin diseases is always a challenging task in the clinic. Utilizing deep learning to diagnose skin diseases could help to meet these challenges. In this study, a novel neural network is proposed for the classification of skin diseases. Since the datasets for the research consist of skin disease images and clinical metadata, we propose a novel multimodal Transformer, which consists of two encoders for both images and metadata and one decoder to fuse the multimodal information. In the proposed network, a suitable Vision Transformer (ViT) model is utilized as the backbone to extract image deep features. As for metadata, they are regarded as labels and a new Soft Label Encoder (SLE) is designed to embed them. Furthermore, in the decoder part, a novel Mutual Attention (MA) block is proposed to better fuse image features and metadata features. To evaluate the model's effectiveness, extensive experiments have been conducted on the private skin disease dataset and the benchmark dataset ISIC 2018. Compared with state-of-the-art methods, the proposed model shows better performance and represents an advancement in skin disease diagnosis.

Citing Articles

Minimal sourced and lightweight federated transfer learning models for skin cancer detection.

Khullar V, Kaur P, Gargrish S, Mishra A, Singh P, Diwakar M Sci Rep. 2025; 15(1):2605.

PMID: 39837883 PMC: 11750969. DOI: 10.1038/s41598-024-82402-x.


Advancing healthcare through multimodal data fusion: a comprehensive review of techniques and applications.

Teoh J, Dong J, Zuo X, Lai K, Hasikin K, Wu X PeerJ Comput Sci. 2024; 10:e2298.

PMID: 39650483 PMC: 11623190. DOI: 10.7717/peerj-cs.2298.


Impact of metadata in multimodal classification of bone tumours.

Hinterwimmer F, Guenther M, Consalvo S, Neumann J, Gersing A, Woertler K BMC Musculoskelet Disord. 2024; 25(1):822.

PMID: 39427131 PMC: 11490032. DOI: 10.1186/s12891-024-07934-9.


Multimodal Transformer Model Using Time-Series Data to Classify Winter Road Surface Conditions.

Moroto Y, Maeda K, Togo R, Ogawa T, Haseyama M Sensors (Basel). 2024; 24(11).

PMID: 38894233 PMC: 11174985. DOI: 10.3390/s24113440.


Exploring the influence of transformer-based multimodal modeling on clinicians' diagnosis of skin diseases: A quantitative analysis.

Zhang Y, Hu Y, Li K, Pan X, Mo X, Zhang H Digit Health. 2024; 10:20552076241257087.

PMID: 38784049 PMC: 11113036. DOI: 10.1177/20552076241257087.


References
1.
Pacheco A, Krohling R . An Attention-Based Mechanism to Combine Images and Metadata in Deep Learning Models Applied to Skin Cancer Classification. IEEE J Biomed Health Inform. 2021; 25(9):3554-3563. DOI: 10.1109/JBHI.2021.3062002. View

2.
Hohn J, Hekler A, Krieghoff-Henning E, Kather J, Utikal J, Meier F . Integrating Patient Data Into Skin Cancer Classification Using Convolutional Neural Networks: Systematic Review. J Med Internet Res. 2021; 23(7):e20708. PMC: 8285747. DOI: 10.2196/20708. View

3.
Gu Y, Ge Z, Bonnington C, Zhou J . Progressive Transfer Learning and Adversarial Domain Adaptation for Cross-Domain Skin Disease Classification. IEEE J Biomed Health Inform. 2019; 24(5):1379-1393. DOI: 10.1109/JBHI.2019.2942429. View

4.
Kawahara J, Daneshvar S, Argenziano G, Hamarneh G . 7-Point Checklist and Skin Lesion Classification using Multi-Task Multi-Modal Neural Nets. IEEE J Biomed Health Inform. 2018; . DOI: 10.1109/JBHI.2018.2824327. View

5.
Tang P, Liang Q, Yan X, Xiang S, Zhang D . GP-CNN-DTEL: Global-Part CNN Model With Data-Transformed Ensemble Learning for Skin Lesion Classification. IEEE J Biomed Health Inform. 2020; 24(10):2870-2882. DOI: 10.1109/JBHI.2020.2977013. View