Natural Language-based Machine Learning Models for the Annotation of Clinical Radiology Reports

Overview

Journal Radiology

Specialty Radiology

Date 2018 Jan 31

PMID 29381109

Citations 53

Authors

John Zech

Margaret Pain

Joseph Titano

Marcus Badgeley

Javin Schefflein

Andres Su

Anthony Costa

Joshua Bederson

Joseph Lehar

Eric Karl Oermann

Affiliations

Soon will be listed here.

Abstract

Purpose To compare different methods for generating features from radiology reports and to develop a method to automatically identify findings in these reports. Materials and Methods In this study, 96 303 head computed tomography (CT) reports were obtained. The linguistic complexity of these reports was compared with that of alternative corpora. Head CT reports were preprocessed, and machine-analyzable features were constructed by using bag-of-words (BOW), word embedding, and Latent Dirichlet allocation-based approaches. Ultimately, 1004 head CT reports were manually labeled for findings of interest by physicians, and a subset of these were deemed critical findings. Lasso logistic regression was used to train models for physician-assigned labels on 602 of 1004 head CT reports (60%) using the constructed features, and the performance of these models was validated on a held-out 402 of 1004 reports (40%). Models were scored by area under the receiver operating characteristic curve (AUC), and aggregate AUC statistics were reported for (a) all labels, (b) critical labels, and (c) the presence of any critical finding in a report. Sensitivity, specificity, accuracy, and F1 score were reported for the best performing model's (a) predictions of all labels and (b) identification of reports containing critical findings. Results The best-performing model (BOW with unigrams, bigrams, and trigrams plus average word embeddings vector) had a held-out AUC of 0.966 for identifying the presence of any critical head CT finding and an average 0.957 AUC across all head CT findings. Sensitivity and specificity for identifying the presence of any critical finding were 92.59% (175 of 189) and 89.67% (191 of 213), respectively. Average sensitivity and specificity across all findings were 90.25% (1898 of 2103) and 91.72% (18 351 of 20 007), respectively. Simpler BOW methods achieved results competitive with those of more sophisticated approaches, with an average AUC for presence of any critical finding of 0.951 for unigram BOW versus 0.966 for the best-performing model. The Yule I of the head CT corpus was 34, markedly lower than that of the Reuters corpus (at 103) or I2B2 discharge summaries (at 271), indicating lower linguistic complexity. Conclusion Automated methods can be used to identify findings in radiology reports. The success of this approach benefits from the standardized language of these reports. With this method, a large labeled corpus can be generated for applications such as deep learning. RSNA, 2018 Online supplemental material is available for this article.

Citing Articles

Radiology Report Annotation Using Generative Large Language Models: Comparative Analysis.

Altalla B, Ahmad A, Bitar L, Al-Bssol M, Omari A, Sultan I Int J Biomed Imaging. 2025; 2025:5019035.

PMID: 39968311 PMC: 11835477. DOI: 10.1155/ijbi/5019035.

Unlocking precision medicine: clinical applications of integrating health records, genetics, and immunology through artificial intelligence.

Chen Y, Hsiao T, Lin C, Fann Y J Biomed Sci. 2025; 32(1):16.

PMID: 39915780 PMC: 11804102. DOI: 10.1186/s12929-024-01110-w.

Risk Factors for Gastrointestinal Bleeding in Patients With Acute Myocardial Infarction: Multicenter Retrospective Cohort Study.

Kou Y, Ye S, Tian Y, Yang K, Qin L, Huang Z J Med Internet Res. 2025; 27:e67346.

PMID: 39883922 PMC: 11826945. DOI: 10.2196/67346.

Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: A data-driven approach for improved classification.

Bigolin Lanfredi R, Mukherjee P, Summers R Med Image Anal. 2024; 99:103383.

PMID: 39546982 PMC: 11609015. DOI: 10.1016/j.media.2024.103383.

Computational Prognostic Modeling in Traumatic Brain Injury.

Pease M, Arefan D, Hammond F, Castellano J, Okonkwo D, Wu S Adv Exp Med Biol. 2024; 1462:475-486.

PMID: 39523284 DOI: 10.1007/978-3-031-64892-2_29.