» Articles » PMID: 38522096

Machine Learning Natural Language Processing for Identifying Venous Thromboembolism: Systematic Review and Meta-analysis

Abstract

Venous thromboembolism (VTE) is a leading cause of preventable in-hospital mortality. Monitoring VTE cases is limited by the challenges of manual medical record review and diagnosis code interpretation. Natural language processing (NLP) can automate the process. Rule-based NLP methods are effective but time consuming. Machine learning (ML)-NLP methods present a promising solution. We conducted a systematic review and meta-analysis of studies published before May 2023 that use ML-NLP to identify VTE diagnoses in the electronic health records. Four reviewers screened all manuscripts, excluding studies that only used a rule-based method. A meta-analysis evaluated the pooled performance of each study's best performing model that evaluated for pulmonary embolism and/or deep vein thrombosis. Pooled sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) with confidence interval (CI) were calculated by DerSimonian and Laird method using a random-effects model. Study quality was assessed using an adapted TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) tool. Thirteen studies were included in the systematic review and 8 had data available for meta-analysis. Pooled sensitivity was 0.931 (95% CI, 0.881-0.962), specificity 0.984 (95% CI, 0.967-0.992), PPV 0.910 (95% CI, 0.865-0.941) and NPV 0.985 (95% CI, 0.977-0.990). All studies met at least 13 of the 21 NLP-modified TRIPOD items, demonstrating fair quality. The highest performing models used vectorization rather than bag-of-words and deep-learning techniques such as convolutional neural networks. There was significant heterogeneity in the studies, and only 4 validated their model on an external data set. Further standardization of ML studies can help progress this novel technology toward real-world implementation.

Citing Articles

Artificial intelligence in thrombosis: transformative potential and emerging challenges.

Al Raizah A, Alrizah M Thromb J. 2025; 23(1):2.

PMID: 39825337 PMC: 11740475. DOI: 10.1186/s12959-025-00690-3.

References
1.
Hossain E, Rana R, Higgins N, Soar J, Barua P, Pisani A . Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review. Comput Biol Med. 2023; 155:106649. DOI: 10.1016/j.compbiomed.2023.106649. View

2.
Fanikos J, Piazza G, Zayaruzny M, Goldhaber S . Long-term complications of medical patients with hospital-acquired venous thromboembolism. Thromb Haemost. 2009; 102(4):688-93. DOI: 10.1160/TH09-04-0266. View

3.
Shah R, Bini S, Vail T . Data for registry and quality review can be retrospectively collected using natural language processing from unstructured charts of arthroplasty patients. Bone Joint J. 2020; 102-B(7_Supple_B):99-104. DOI: 10.1302/0301-620X.102B7.BJJ-2019-1574.R1. View

4.
Dantes R, Zheng S, Lu J, Beckman M, Krishnaswamy A, Richardson L . Improved Identification of Venous Thromboembolism From Electronic Medical Records Using a Novel Information Extraction Software Platform. Med Care. 2017; 56(9):e54-e60. PMC: 5927846. DOI: 10.1097/MLR.0000000000000831. View

5.
Banerjee I, Chen M, Lungren M, Rubin D . Radiology report annotation using intelligent word embeddings: Applied to multi-institutional chest CT cohort. J Biomed Inform. 2017; 77:11-20. PMC: 5771955. DOI: 10.1016/j.jbi.2017.11.012. View