Machine Learning Natural Language Processing for Identifying Venous Thromboembolism: Systematic Review and Meta-analysis

Overview

Journal Blood Adv

Specialty Hematology

Date 2024 Mar 24

PMID 38522096

Authors

Barbara D Lam

Pavlina Chrysafi

Thita Chiasakul

Harshit Khosla

Dimitra Karagkouni

Megan McNichol

Alys Adamski

Nimia Reyes

Karon Abe

Simon Mantha

Ioannis S Vlachos

Jeffrey I Zwicker

Rushad Patell

Affiliations

Soon will be listed here.

Abstract

Venous thromboembolism (VTE) is a leading cause of preventable in-hospital mortality. Monitoring VTE cases is limited by the challenges of manual medical record review and diagnosis code interpretation. Natural language processing (NLP) can automate the process. Rule-based NLP methods are effective but time consuming. Machine learning (ML)-NLP methods present a promising solution. We conducted a systematic review and meta-analysis of studies published before May 2023 that use ML-NLP to identify VTE diagnoses in the electronic health records. Four reviewers screened all manuscripts, excluding studies that only used a rule-based method. A meta-analysis evaluated the pooled performance of each study's best performing model that evaluated for pulmonary embolism and/or deep vein thrombosis. Pooled sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) with confidence interval (CI) were calculated by DerSimonian and Laird method using a random-effects model. Study quality was assessed using an adapted TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) tool. Thirteen studies were included in the systematic review and 8 had data available for meta-analysis. Pooled sensitivity was 0.931 (95% CI, 0.881-0.962), specificity 0.984 (95% CI, 0.967-0.992), PPV 0.910 (95% CI, 0.865-0.941) and NPV 0.985 (95% CI, 0.977-0.990). All studies met at least 13 of the 21 NLP-modified TRIPOD items, demonstrating fair quality. The highest performing models used vectorization rather than bag-of-words and deep-learning techniques such as convolutional neural networks. There was significant heterogeneity in the studies, and only 4 validated their model on an external data set. Further standardization of ML studies can help progress this novel technology toward real-world implementation.

Citing Articles

Artificial intelligence in thrombosis: transformative potential and emerging challenges.

Al Raizah A, Alrizah M Thromb J. 2025; 23(1):2.

PMID: 39825337 PMC: 11740475. DOI: 10.1186/s12959-025-00690-3.

References

Hossain E, Rana R, Higgins N, Soar J, Barua P, Pisani A . Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review. Comput Biol Med. 2023; 155:106649. DOI: 10.1016/j.compbiomed.2023.106649. View

Fanikos J, Piazza G, Zayaruzny M, Goldhaber S . Long-term complications of medical patients with hospital-acquired venous thromboembolism. Thromb Haemost. 2009; 102(4):688-93. DOI: 10.1160/TH09-04-0266. View

Shah R, Bini S, Vail T . Data for registry and quality review can be retrospectively collected using natural language processing from unstructured charts of arthroplasty patients. Bone Joint J. 2020; 102-B(7_Supple_B):99-104. DOI: 10.1302/0301-620X.102B7.BJJ-2019-1574.R1. View

Dantes R, Zheng S, Lu J, Beckman M, Krishnaswamy A, Richardson L . Improved Identification of Venous Thromboembolism From Electronic Medical Records Using a Novel Information Extraction Software Platform. Med Care. 2017; 56(9):e54-e60. PMC: 5927846. DOI: 10.1097/MLR.0000000000000831. View

Banerjee I, Chen M, Lungren M, Rubin D . Radiology report annotation using intelligent word embeddings: Applied to multi-institutional chest CT cohort. J Biomed Inform. 2017; 77:11-20. PMC: 5771955. DOI: 10.1016/j.jbi.2017.11.012. View

Danilov G, Ishankulov T, Kosyrkova A, Shults M, Melchenko S, Tsukanova T . Semiautomatic Identification of Pulmonary Embolism in Electronic Health Records Through Sentence Labeling. Stud Health Technol Inform. 2022; 289:69-72. DOI: 10.3233/SHTI210861. View

Lee J, Yoon W, Kim S, Kim D, Kim S, So C . BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2019; 36(4):1234-1240. PMC: 7703786. DOI: 10.1093/bioinformatics/btz682. View

Casey A, Davidson E, Poon M, Dong H, Duma D, Grivas A . A systematic review of natural language processing applied to radiology reports. BMC Med Inform Decis Mak. 2021; 21(1):179. PMC: 8176715. DOI: 10.1186/s12911-021-01533-7. View

Pham A, Neveol A, Lavergne T, Yasunaga D, Clement O, Meyer G . Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings. BMC Bioinformatics. 2014; 15:266. PMC: 4133634. DOI: 10.1186/1471-2105-15-266. View

10.

Rochefort C, Verma A, Eguale T, Lee T, Buckeridge D . A novel method of adverse event detection can accurately identify venous thromboembolisms (VTEs) from narrative electronic health record data. J Am Med Inform Assoc. 2014; 22(1):155-65. PMC: 4433368. DOI: 10.1136/amiajnl-2014-002768. View

11.

Wendelboe A, Saber I, Dvorak J, Adamski A, Feland N, Reyes N . Exploring the Applicability of Using Natural Language Processing to Support Nationwide Venous Thromboembolism Surveillance: Model Evaluation Study. JMIR Bioinform Biotechnol. 2023; 3(1):e36877. PMC: 10193259. DOI: 10.2196/36877. View

12.

Laparra E, Mascio A, Velupillai S, Miller T . A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records. Yearb Med Inform. 2021; 30(1):239-244. PMC: 8416218. DOI: 10.1055/s-0041-1726522. View

13.

Henke P, Kahn S, Pannucci C, Secemksy E, Evans N, Khorana A . Call to Action to Prevent Venous Thromboembolism in Hospitalized Patients: A Policy Statement From the American Heart Association. Circulation. 2020; 141(24):e914-e931. DOI: 10.1161/CIR.0000000000000769. View

14.

Collins G, Dhiman P, Andaur Navarro C, Ma J, Hooft L, Reitsma J . Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open. 2021; 11(7):e048008. PMC: 8273461. DOI: 10.1136/bmjopen-2020-048008. View

15.

Selby L, Narain W, Russo A, Strong V, Stetson P . Autonomous detection, grading, and reporting of postoperative complications using natural language processing. Surgery. 2018; 164(6):1300-1305. PMC: 6784320. DOI: 10.1016/j.surg.2018.05.008. View

16.

Chapman B, Lee S, Kang H, Chapman W . Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm. J Biomed Inform. 2011; 44(5):728-37. PMC: 3164892. DOI: 10.1016/j.jbi.2011.03.011. View

17.

Chen M, Ball R, Yang L, Moradzadeh N, Chapman B, Larson D . Deep Learning to Classify Radiology Free-Text Reports. Radiology. 2017; 286(3):845-852. DOI: 10.1148/radiol.2017171115. View

18.

Banerjee I, Ling Y, Chen M, Hasan S, Langlotz C, Moradzadeh N . Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification. Artif Intell Med. 2018; 97:79-88. PMC: 6533167. DOI: 10.1016/j.artmed.2018.11.004. View

19.

Tu J . Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J Clin Epidemiol. 1996; 49(11):1225-31. DOI: 10.1016/s0895-4356(96)00002-9. View

20.

Yao L, Mao C, Luo Y . Clinical text classification with rule-based features and knowledge-guided convolutional neural networks. BMC Med Inform Decis Mak. 2019; 19(Suppl 3):71. PMC: 6448186. DOI: 10.1186/s12911-019-0781-4. View