» Articles » PMID: 36123745

An Ensemble Learning with Active Sampling to Predict the Prognosis of Postoperative Non-small Cell Lung Cancer Patients

Overview
Publisher Biomed Central
Date 2022 Sep 19
PMID 36123745
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Lung cancer is the leading cause of cancer death worldwide. Prognostic prediction plays a vital role in the decision-making process for postoperative non-small cell lung cancer (NSCLC) patients. However, the high imbalance ratio of prognostic data limits the development of effective prognostic prediction models.

Methods: In this study, we present a novel approach, namely ensemble learning with active sampling (ELAS), to tackle the imbalanced data problem in NSCLC prognostic prediction. ELAS first applies an active sampling mechanism to query the most informative samples to update the base classifier to give it a new perspective. This training process is repeated until no enough samples are queried. Next, an internal validation set is employed to evaluate the base classifiers, and the ones with the best performances are integrated as the ensemble model. Besides, we set up multiple initial training data seeds and internal validation sets to ensure the stability and generalization of the model.

Results: We verified the effectiveness of the ELAS on a real clinical dataset containing 1848 postoperative NSCLC patients. Experimental results showed that the ELAS achieved the best averaged 0.736 AUROC value and 0.453 AUPRC value for 6 prognostic tasks and obtained significant improvements in comparison with the SVM, AdaBoost, Bagging, SMOTE and TomekLinks.

Conclusions: We conclude that the ELAS can effectively alleviate the imbalanced data problem in NSCLC prognostic prediction and demonstrates good potential for future postoperative NSCLC prognostic prediction.

Citing Articles

Integrating Omics Data and AI for Cancer Diagnosis and Prognosis.

Ozaki Y, Broughton P, Abdollahi H, Valafar H, Blenda A Cancers (Basel). 2024; 16(13).

PMID: 39001510 PMC: 11240413. DOI: 10.3390/cancers16132448.

References
1.
Huang Z, Chan T, Dong W . MACE prediction of acute coronary syndrome via boosted resampling classification using electronic medical records. J Biomed Inform. 2017; 66:161-170. DOI: 10.1016/j.jbi.2017.01.001. View

2.
Oh S, Lee M, Zhang B . Ensemble learning with active example selection for imbalanced biomedical data classification. IEEE/ACM Trans Comput Biol Bioinform. 2010; 8(2):316-25. DOI: 10.1109/TCBB.2010.96. View

3.
Wang G, Lam K, Deng Z, Choi K . Prediction of mortality after radical cystectomy for bladder cancer by machine learning techniques. Comput Biol Med. 2015; 63:124-32. DOI: 10.1016/j.compbiomed.2015.05.015. View

4.
Sung H, Ferlay J, Siegel R, Laversanne M, Soerjomataram I, Jemal A . Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021; 71(3):209-249. DOI: 10.3322/caac.21660. View

5.
Jalil R, Ahmed M, Green J, Sevdalis N . Factors that can make an impact on decision-making and decision implementation in cancer multidisciplinary teams: an interview study of the provider perspective. Int J Surg. 2013; 11(5):389-94. DOI: 10.1016/j.ijsu.2013.02.026. View