» Articles » PMID: 39079116

Early Detection of Pulmonary Embolism in a General Patient Population Immediately Upon Hospital Admission Using Machine Learning to Identify New, Unidentified Risk Factors: Model Development Study

Overview
Publisher JMIR Publications
Date 2024 Jul 30
PMID 39079116
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Under- or late identification of pulmonary embolism (PE)-a thrombosis of 1 or more pulmonary arteries that seriously threatens patients' lives-is a major challenge confronting modern medicine.

Objective: We aimed to establish accurate and informative machine learning (ML) models to identify patients at high risk for PE as they are admitted to the hospital, before their initial clinical checkup, by using only the information in their medical records.

Methods: We collected demographics, comorbidities, and medications data for 2568 patients with PE and 52,598 control patients. We focused on data available prior to emergency department admission, as these are the most universally accessible data. We trained an ML random forest algorithm to detect PE at the earliest possible time during a patient's hospitalization-at the time of his or her admission. We developed and applied 2 ML-based methods specifically to address the data imbalance between PE and non-PE patients, which causes misdiagnosis of PE.

Results: The resulting models predicted PE based on age, sex, BMI, past clinical PE events, chronic lung disease, past thrombotic events, and usage of anticoagulants, obtaining an 80% geometric mean value for the PE and non-PE classification accuracies. Although on hospital admission only 4% (1942/46,639) of the patients had a diagnosis of PE, we identified 2 clustering schemes comprising subgroups with more than 61% (705/1120 in clustering scheme 1; 427/701 and 340/549 in clustering scheme 2) positive patients for PE. One subgroup in the first clustering scheme included 36% (705/1942) of all patients with PE who were characterized by a definite past PE diagnosis, a 6-fold higher prevalence of deep vein thrombosis, and a 3-fold higher prevalence of pneumonia, compared with patients of the other subgroups in this scheme. In the second clustering scheme, 2 subgroups (1 of only men and 1 of only women) included patients who all had a past PE diagnosis and a relatively high prevalence of pneumonia, and a third subgroup included only those patients with a past diagnosis of pneumonia.

Conclusions: This study established an ML tool for early diagnosis of PE almost immediately upon hospital admission. Despite the highly imbalanced scenario undermining accurate PE prediction and using information available only from the patient's medical history, our models were both accurate and informative, enabling the identification of patients already at high risk for PE upon hospital admission, even before the initial clinical checkup was performed. The fact that we did not restrict our patients to those at high risk for PE according to previously published scales (eg, Wells or revised Genova scores) enabled us to accurately assess the application of ML on raw medical data and identify new, previously unidentified risk factors for PE, such as previous pulmonary disease, in general populations.

Citing Articles

Large Language Model Approach for Zero-Shot Information Extraction and Clustering of Japanese Radiology Reports: Algorithm Development and Validation.

Yamagishi Y, Nakamura Y, Hanaoka S, Abe O JMIR Cancer. 2025; 11:e57275.

PMID: 39864093 PMC: 11867198. DOI: 10.2196/57275.

References
1.
Ceriani E, Combescure C, Le Gal G, Nendaz M, Perneger T, Bounameaux H . Clinical prediction rules for pulmonary embolism: a systematic review and meta-analysis. J Thromb Haemost. 2010; 8(5):957-70. DOI: 10.1111/j.1538-7836.2010.03801.x. View

2.
Mandrekar J . Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol. 2010; 5(9):1315-6. DOI: 10.1097/JTO.0b013e3181ec173d. View

3.
Morrone D, Morrone V . Acute Pulmonary Embolism: Focus on the Clinical Picture. Korean Circ J. 2018; 48(5):365-381. PMC: 5940642. DOI: 10.4070/kcj.2017.0314. View

4.
Halbersberg D, Lerner B . Young driver fatal motorcycle accident analysis by jointly maximizing accuracy and information. Accid Anal Prev. 2019; 129:350-361. DOI: 10.1016/j.aap.2019.04.016. View

5.
Ma Y, Huang J, Wang Y, Wu T, Cai D, Liu Y . Comparison of the Wells score with the revised Geneva score for assessing pretest probability of pulmonary embolism in hospitalized elderly patients. Eur J Intern Med. 2016; 36:e18-e19. DOI: 10.1016/j.ejim.2016.09.003. View