» Articles » PMID: 36146145

An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction

Overview
Journal Sensors (Basel)
Publisher MDPI
Specialty Biotechnology
Date 2022 Sep 23
PMID 36146145
Authors
Affiliations
Soon will be listed here.
Abstract

Although lung cancer survival status and survival length predictions have primarily been studied individually, a scheme that leverages both fields in an interpretable way for physicians remains elusive. We propose a two-phase data analytic framework that is capable of classifying survival status for 0.5-, 1-, 1.5-, 2-, 2.5-, and 3-year time-points (phase I) and predicting the number of survival months within 3 years (phase II) using recent Surveillance, Epidemiology, and End Results data from 2010 to 2017. In this study, we employ three analytical models (general linear model, extreme gradient boosting, and artificial neural networks), five data balancing techniques (synthetic minority oversampling technique (SMOTE), relocating safe level SMOTE, borderline SMOTE, adaptive synthetic sampling, and majority weighted minority oversampling technique), two feature selection methods (least absolute shrinkage and selection operator (LASSO) and random forest), and the one-hot encoding approach. By implementing a comprehensive data preparation phase, we demonstrate that a computationally efficient and interpretable method such as GLM performs comparably to more complex models. Moreover, we quantify the effects of individual features in phase I and II by exploiting GLM coefficients. To the best of our knowledge, this study is the first to (a) implement a comprehensive data processing approach to develop performant, computationally efficient, and interpretable methods in comparison to black-box models, (b) visualize top factors impacting survival odds by utilizing the change in odds ratio, and (c) comprehensively explore short-term lung cancer survival using a two-phase approach.

References
1.
Wang Y, Liu S, Wang Z, Fan Y, Huang J, Huang L . A Machine Learning-Based Investigation of Gender-Specific Prognosis of Lung Cancers. Medicina (Kaunas). 2021; 57(2). PMC: 7911834. DOI: 10.3390/medicina57020099. View

2.
Sedighi Maman Z, Alamdar Yazdi M, Cavuoto L, Megahed F . A data-driven approach to modeling physical fatigue in the workplace using wearable sensors. Appl Ergon. 2017; 65:515-529. DOI: 10.1016/j.apergo.2017.02.001. View

3.
Liao Y, Yin G, Fan X . The Positive Lymph Node Ratio Predicts Survival in TNM Non-Small Cell Lung Cancer: A Nomogram Using the SEER Database. Front Oncol. 2020; 10:1356. PMC: 7438846. DOI: 10.3389/fonc.2020.01356. View

4.
Zuo Z, Zhang G, Song P, Yang J, Li S, Zhong Z . Survival Nomogram for Stage IB Non-Small-Cell Lung Cancer Patients, Based on the SEER Database and an External Validation Cohort. Ann Surg Oncol. 2020; 28(7):3941-3950. DOI: 10.1245/s10434-020-09362-0. View

5.
Al Mudawi N, Alazeb A . A Model for Predicting Cervical Cancer Using Machine Learning Algorithms. Sensors (Basel). 2022; 22(11). PMC: 9185380. DOI: 10.3390/s22114132. View