» Articles » PMID: 40050277

Towards a Holistic Framework for Multimodal LLM in 3D Brain CT Radiology Report Generation

Overview
Journal Nat Commun
Specialty Biology
Date 2025 Mar 6
PMID 40050277
Authors
Affiliations
Soon will be listed here.
Abstract

Multi-modal large language models (MLLMs) have transformed the landscape of modern healthcare, with automated radiology report generation (RRG) emerging as a cutting-edge application. While 2D MLLM-based RRG has been well established, its utility for 3D medical images remains largely unexplored. In this regard, we curate the 3D-BrainCT dataset (18,885 text-scan pairs) and develop BrainGPT, a clinically visual instruction-tuned (CVIT) model designed for 3D CT RRG. While we notice that the traditional LLM metrics failed to gauge the diagnostic quality of the RRG, we propose feature-oriented radiology task evaluation (FORTE), an evaluation scheme that captures the clinical essence of the generated reports. Here we show that BrainGPT achieves an average FORTE F1-score of 0.71 (degree = 0.661; landmark = 0.706; feature = 0.693, and impression = 0.779) and 74% of BrainGPT-generated reports were indistinguishable from human-written ground truth in a Turing-like test. Together, our work establishes a comprehensive framework encompassing dataset curation, anatomy-aware model fine-tuning, and the development of robust evaluation metrics for the RRG. By sharing our experience in 3D MLLM-based RRG, we aim to accelerate the expedition in human-machine collaboration for next-generation healthcare.

Citing Articles

Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation.

Li C, Chang K, Yang C, Wu H, Chen W, Bansal H Nat Commun. 2025; 16(1):2258.

PMID: 40050277 PMC: 11885477. DOI: 10.1038/s41467-025-57426-0.

References
1.
Youssef A, Pencina M, Thakur A, Zhu T, Clifton D, Shah N . External validation of AI models in health should be replaced with recurring local validation. Nat Med. 2023; 29(11):2686-2687. DOI: 10.1038/s41591-023-02540-z. View

2.
Cao K, Xia Y, Yao J, Han X, Lambert L, Zhang T . Large-scale pancreatic cancer detection via non-contrast CT and deep learning. Nat Med. 2023; 29(12):3033-3043. PMC: 10719100. DOI: 10.1038/s41591-023-02640-w. View

3.
Chilamkurthy S, Ghosh R, Tanamala S, Biviji M, Campeau N, Venugopal V . Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet. 2018; 392(10162):2388-2396. DOI: 10.1016/S0140-6736(18)31645-3. View

4.
Li C, Chang K, Yang C, Wu H, Chen W, Bansal H . Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation. Nat Commun. 2025; 16(1):2258. PMC: 11885477. DOI: 10.1038/s41467-025-57426-0. View

5.
Selivanov A, Rogov O, Chesakov D, Shelmanov A, Fedulova I, Dylov D . Medical image captioning via generative pretrained transformers. Sci Rep. 2023; 13(1):4171. PMC: 10010644. DOI: 10.1038/s41598-023-31223-5. View