» Articles » PMID: 35488285

BPI-MVQA: a Bi-branch Model for Medical Visual Question Answering

Overview
Journal BMC Med Imaging
Publisher Biomed Central
Specialty Radiology
Date 2022 Apr 29
PMID 35488285
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Visual question answering in medical domain (VQA-Med) exhibits great potential for enhancing confidence in diagnosing diseases and helping patients better understand their medical conditions. One of the challenges in VQA-Med is how to better understand and combine the semantic features of medical images (e.g., X-rays, Magnetic Resonance Imaging(MRI)) and answer the corresponding questions accurately in unlabeled medical datasets.

Method: We propose a novel Bi-branched model based on Parallel networks and Image retrieval for Medical Visual Question Answering (BPI-MVQA). The first branch of BPI-MVQA is a transformer structure based on a parallel network to achieve complementary advantages in image sequence feature and spatial feature extraction, and multi-modal features are implicitly fused by using the multi-head self-attention mechanism. The second branch is retrieving the similarity of image features generated by the VGG16 network to obtain similar text descriptions as labels.

Result: The BPI-MVQA model achieves state-of-the-art results on three VQA-Med datasets, and the main metric scores exceed the best results so far by 0.2[Formula: see text], 1.4[Formula: see text], and 1.1[Formula: see text].

Conclusion: The evaluation results support the effectiveness of the BPI-MVQA model in VQA-Med. The design of the bi-branch structure helps the model answer different types of visual questions. The parallel network allows for multi-angle image feature extraction, a unique feature extraction method that helps the model better understand the semantic information of the image and achieve greater accuracy in the multi-classification of VQA-Med. In addition, image retrieval helps the model answer irregular, open-ended type questions from the perspective of understanding the information provided by images. The comparison of our method with state-of-the-art methods on three datasets also shows that our method can bring substantial improvement to the VQA-Med system.

Citing Articles

A scoping review on multimodal deep learning in biomedical images and texts.

Sun Z, Lin M, Zhu Q, Xie Q, Wang F, Lu Z J Biomed Inform. 2023; 146:104482.

PMID: 37652343 PMC: 10591890. DOI: 10.1016/j.jbi.2023.104482.


Vision-Language Model for Visual Question Answering in Medical Imagery.

Bazi Y, Rahhal M, Bashmal L, Zuair M Bioengineering (Basel). 2023; 10(3).

PMID: 36978771 PMC: 10045796. DOI: 10.3390/bioengineering10030380.

References
1.
Hii P, Chung W . A comprehensive ubiquitous healthcare solution on an Android™ mobile device. Sensors (Basel). 2011; 11(7):6799-815. PMC: 3231662. DOI: 10.3390/s110706799. View

2.
Cao Y, Liu F, Simpson P, Antieau L, Bennett A, Cimino J . AskHERMES: An online question answering system for complex clinical questions. J Biomed Inform. 2011; 44(2):277-88. PMC: 3433744. DOI: 10.1016/j.jbi.2011.01.004. View

3.
Izcovich A, Criniti J, Ruiz J, Catalano H . Impact of a GRADE-based medical question answering system on physician behaviour: a randomised controlled trial. Evid Based Med. 2015; 20(3):81-7. DOI: 10.1136/ebmed-2014-110146. View

4.
Lin Z, Zhang D, Tao Q, Shi D, Haffari G, Wu Q . Medical visual question answering: A survey. Artif Intell Med. 2023; 143:102611. DOI: 10.1016/j.artmed.2023.102611. View

5.
Thompson T, Grove L, Brown J, Buchan J, Kerry A, Burge S . COGConnect: A new visual resource for teaching and learning effective consulting. Patient Educ Couns. 2021; 104(8):2126-2132. DOI: 10.1016/j.pec.2020.12.016. View