Towards a Holistic Framework for Multimodal LLM in 3D Brain CT Radiology Report Generation

Overview

Journal Nat Commun

Specialty Biology

Date 2025 Mar 6

PMID 40050277

Authors

Cheng-Yi Li

Kao-Jung Chang

Cheng-Fu Yang

Hsin-Yu Wu

Wenting Chen

Hritik Bansal

Ling Chen

Yi-Ping Yang

Yu-Chun Chen

Shih-Pin Chen

Shih-Jen Chen

Jiing-Feng Lirng

Kai-Wei Chang

Shih-Hwa Chiou

Affiliations

Soon will be listed here.

Abstract

Multi-modal large language models (MLLMs) have transformed the landscape of modern healthcare, with automated radiology report generation (RRG) emerging as a cutting-edge application. While 2D MLLM-based RRG has been well established, its utility for 3D medical images remains largely unexplored. In this regard, we curate the 3D-BrainCT dataset (18,885 text-scan pairs) and develop BrainGPT, a clinically visual instruction-tuned (CVIT) model designed for 3D CT RRG. While we notice that the traditional LLM metrics failed to gauge the diagnostic quality of the RRG, we propose feature-oriented radiology task evaluation (FORTE), an evaluation scheme that captures the clinical essence of the generated reports. Here we show that BrainGPT achieves an average FORTE F1-score of 0.71 (degree = 0.661; landmark = 0.706; feature = 0.693, and impression = 0.779) and 74% of BrainGPT-generated reports were indistinguishable from human-written ground truth in a Turing-like test. Together, our work establishes a comprehensive framework encompassing dataset curation, anatomy-aware model fine-tuning, and the development of robust evaluation metrics for the RRG. By sharing our experience in 3D MLLM-based RRG, we aim to accelerate the expedition in human-machine collaboration for next-generation healthcare.

Citing Articles

Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation.

Li C, Chang K, Yang C, Wu H, Chen W, Bansal H Nat Commun. 2025; 16(1):2258.

PMID: 40050277 PMC: 11885477. DOI: 10.1038/s41467-025-57426-0.

References

Youssef A, Pencina M, Thakur A, Zhu T, Clifton D, Shah N . External validation of AI models in health should be replaced with recurring local validation. Nat Med. 2023; 29(11):2686-2687. DOI: 10.1038/s41591-023-02540-z. View

Cao K, Xia Y, Yao J, Han X, Lambert L, Zhang T . Large-scale pancreatic cancer detection via non-contrast CT and deep learning. Nat Med. 2023; 29(12):3033-3043. PMC: 10719100. DOI: 10.1038/s41591-023-02640-w. View

Chilamkurthy S, Ghosh R, Tanamala S, Biviji M, Campeau N, Venugopal V . Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet. 2018; 392(10162):2388-2396. DOI: 10.1016/S0140-6736(18)31645-3. View

Li C, Chang K, Yang C, Wu H, Chen W, Bansal H . Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation. Nat Commun. 2025; 16(1):2258. PMC: 11885477. DOI: 10.1038/s41467-025-57426-0. View

Selivanov A, Rogov O, Chesakov D, Shelmanov A, Fedulova I, Dylov D . Medical image captioning via generative pretrained transformers. Sci Rep. 2023; 13(1):4171. PMC: 10010644. DOI: 10.1038/s41598-023-31223-5. View

Rajpurkar P, Lungren M . The Current and Future State of AI Interpretation of Medical Images. N Engl J Med. 2023; 388(21):1981-1990. DOI: 10.1056/NEJMra2301725. View

Yu F, Endo M, Krishnan R, Pan I, Tsai A, Reis E . Evaluating progress in automatic chest X-ray radiology report generation. Patterns (N Y). 2023; 4(9):100802. PMC: 10499844. DOI: 10.1016/j.patter.2023.100802. View

Nicolson A, Dowling J, Koopman B . Improving chest X-ray report generation by leveraging warm starting. Artif Intell Med. 2023; 144:102633. DOI: 10.1016/j.artmed.2023.102633. View

Yang S, Wu X, Ge S, Zheng Z, Kevin Zhou S, Xiao L . Radiology report generation with a learned knowledge base and multi-modal alignment. Med Image Anal. 2023; 86:102798. DOI: 10.1016/j.media.2023.102798. View

10.

Dai L, Sheng B, Chen T, Wu Q, Liu R, Cai C . A deep learning system for predicting time to progression of diabetic retinopathy. Nat Med. 2024; 30(2):584-594. PMC: 10878973. DOI: 10.1038/s41591-023-02702-z. View

11.

Demner-Fushman D, Kohli M, Rosenman M, Shooshan S, Rodriguez L, Antani S . Preparing a collection of radiology examinations for distribution and retrieval. J Am Med Inform Assoc. 2015; 23(2):304-10. PMC: 5009925. DOI: 10.1093/jamia/ocv080. View

12.

Groh M, Badri O, Daneshjou R, Koochek A, Harris C, Soenksen L . Deep learning-aided decision support for diagnosis of skin disease across skin tones. Nat Med. 2024; 30(2):573-583. PMC: 10878981. DOI: 10.1038/s41591-023-02728-3. View

13.

Konig M . Brain perfusion CT in acute stroke: current status. Eur J Radiol. 2003; 45 Suppl 1:S11-22. DOI: 10.1016/s0720-048x(02)00359-5. View

14.

Johnson A, Pollard T, Berkowitz S, Greenbaum N, Lungren M, Deng C . MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data. 2019; 6(1):317. PMC: 6908718. DOI: 10.1038/s41597-019-0322-0. View

15.

Wysoki M, Nassar C, Koenigsberg R, Novelline R, Faro S, Faerber E . Head trauma: CT scan interpretation by radiology residents versus staff radiologists. Radiology. 1998; 208(1):125-8. DOI: 10.1148/radiology.208.1.9646802. View

16.

Jackson V, Cushing T, Abujudeh H, Borgstede J, Chin K, Grimes C . RADPEER scoring white paper. J Am Coll Radiol. 2008; 6(1):21-5. DOI: 10.1016/j.jacr.2008.06.011. View

17.

Tian F, Liu D, Wei N, Fu Q, Sun L, Liu W . Prediction of tumor origin in cancers of unknown primary origin with cytology-based deep learning. Nat Med. 2024; 30(5):1309-1319. PMC: 11108774. DOI: 10.1038/s41591-024-02915-w. View

18.

Haydel M, Preston C, Mills T, Luber S, Blaudeau E, DeBlieux P . Indications for computed tomography in patients with minor head injury. N Engl J Med. 2000; 343(2):100-5. DOI: 10.1056/NEJM200007133430204. View

19.

Singhal K, Azizi S, Tu T, Mahdavi S, Wei J, Chung H . Large language models encode clinical knowledge. Nature. 2023; 620(7972):172-180. PMC: 10396962. DOI: 10.1038/s41586-023-06291-2. View

20.

Doshi R, Amin K, Khosla P, Bajaj S, Chheang S, Forman H . Quantitative Evaluation of Large Language Models to Streamline Radiology Report Impressions: A Multimodal Retrospective Analysis. Radiology. 2024; 310(3):e231593. DOI: 10.1148/radiol.231593. View