» Articles » PMID: 39719477

Pulmonologists-level Lung Cancer Detection Based on Standard Blood Test Results and Smoking Status Using an Explainable Machine Learning Approach

Abstract

Lung cancer (LC) remains the primary cause of cancer-related mortality, largely due to late-stage diagnoses. Effective strategies for early detection are therefore of paramount importance. In recent years, machine learning (ML) has demonstrated considerable potential in healthcare by facilitating the detection of various diseases. In this retrospective development and validation study, we developed an ML model based on dynamic ensemble selection (DES) for LC detection. The model leverages standard blood sample analysis and smoking history data from a large population at risk in Denmark. The study includes all patients examined on suspicion of LC in the Region of Southern Denmark from 2009 to 2018. We validated and compared the predictions by the DES model with diagnoses provided by five pulmonologists. Among the 38,944 patients, 9,940 had complete data of which 2,505 (25%) had LC. The DES model achieved an area under the roc curve of 0.77±0.01, sensitivity of 76.2%±2.04%, specificity of 63.8%±2.3%, positive predictive value of 41.6%±1.2%, and F-score of 53.8%±1.0%. The DES model outperformed all five pulmonologists, achieving a sensitivity 6.5% higher than their average. The model identified smoking status, lactate dehydrogenase, age, total calcium levels, low values of sodium, leucocytes, neutrophil count, and C-reactive protein as the most important factors for LC detection. The results highlight the successful application of the ML approach in detecting LC, surpassing pulmonologists' performance. Incorporating clinical and laboratory data in future risk assessment models can improve decision-making and facilitate timely referrals.

Citing Articles

Lung Cancer Detection Using Bayesian Networks: A Retrospective Development and Validation Study on a Danish Population of High-Risk Individuals.

Henriksen M, Van Daalen F, Wee L, Hansen T, Jensen L, Brasen C Cancer Med. 2025; 14(3):e70458.

PMID: 39887592 PMC: 11783238. DOI: 10.1002/cam4.70458.

References
1.
Aberle D, Adams A, Berg C, Black W, Clapp J, Fagerstrom R . Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med. 2011; 365(5):395-409. PMC: 4356534. DOI: 10.1056/NEJMoa1102873. View

2.
Dawson Q . NELSON trial: reduced lung-cancer mortality with volume CT screening. Lancet Respir Med. 2020; 8(3):236. DOI: 10.1016/S2213-2600(20)30059-X. View

3.
Tammemagi M, Church T, Hocking W, Silvestri G, Kvale P, Riley T . Evaluation of the lung cancer risks at which to screen ever- and never-smokers: screening rules applied to the PLCO and NLST cohorts. PLoS Med. 2014; 11(12):e1001764. PMC: 4251899. DOI: 10.1371/journal.pmed.1001764. View

4.
Wang X, Zhang Y, Hao S, Zheng L, Liao J, Ye C . Prediction of the 1-Year Risk of Incident Lung Cancer: Prospective Study Using Electronic Health Records from the State of Maine. J Med Internet Res. 2019; 21(5):e13260. PMC: 6542253. DOI: 10.2196/13260. View

5.
Henriksen M, Hansen T, Jensen L, Brasen C, Peimankar A, Ebrahimi A . A collection of multiregistry data on patients at high risk of lung cancer-a Danish retrospective cohort study of nearly 40,000 patients. Transl Lung Cancer Res. 2024; 12(12):2392-2411. PMC: 10774999. DOI: 10.21037/tlcr-23-495. View