» Articles » PMID: 31956375

Identification of a Sixteen-gene Prognostic Biomarker for Lung Adenocarcinoma Using a Machine Learning Method

Overview
Journal J Cancer
Specialty Oncology
Date 2020 Jan 21
PMID 31956375
Citations 30
Authors
Affiliations
Soon will be listed here.
Abstract

: Lung adenocarcinoma (LUAD) accounts for a majority of cancer-related deaths worldwide annually. The identification of prognostic biomarkers and prediction of prognosis for LUAD patients is necessary. : In this study, LUAD RNA-Seq data and clinical data from the Cancer Genome Atlas (TCGA) were divided into TCGA cohort I (n = 338) and II (n = 168). The cohort I was used for model construction, and the cohort II and data from Gene Expression Omnibus (GSE72094 cohort, n = 393; GSE11969 cohort, n = 149) were utilized for validation. First, the survival-related seed genes were selected from the cohort I using the machine learning model (random survival forest, RSF), and then in order to improve prediction accuracy, the forward selection model was utilized to identify the prognosis-related key genes among the seed genes using the clinically-integrated RNA-Seq data. Second, the survival risk score system was constructed by using these key genes in the cohort II, the GSE72094 cohort and the GSE11969 cohort, and the evaluation metrics such as HR, value and C-index were calculated to validate the proposed method. Third, the developed approach was compared with the previous five prediction models. Finally, bioinformatics analyses (pathway, heatmap, protein-gene interaction network) have been applied to the identified seed genes and key genes. : Based on the RSF model and clinically-integrated RNA-Seq data, we identified sixteen key genes that formed the prognostic gene expression signature. These sixteen key genes could achieve a strong power for prognostic prediction of LUAD patients in cohort II (HR = 3.80, = 1.63e-06, C-index = 0.656), and were further validated in the GSE72094 cohort (HR = 4.12, = 1.34e-10, C-index = 0.672) and GSE11969 cohort (HR = 3.87, = 6.81e-07, C-index = 0.670). The experimental results of three independent validation cohorts showed that compared with the traditional Cox model and the use of standalone RNA-Seq data, the machine-learning-based method effectively improved the prediction accuracy of LUAD prognosis, and the derived model was also superior to the other five existing prediction models. KEGG pathway analysis found eleven of the sixteen genes were associated with Nicotine addiction. Thirteen of the sixteen genes were reported for the first time as the LUAD prognosis-related key genes. In conclusion, we developed a sixteen-gene prognostic marker for LUAD, which may provide a powerful prognostic tool for precision oncology.

Citing Articles

Machine learning reveals glycolytic key gene in gastric cancer prognosis.

Li N, Zhang Y, Zhang Q, Jin H, Han M, Guo J Sci Rep. 2025; 15(1):8688.

PMID: 40082583 PMC: 11906761. DOI: 10.1038/s41598-025-93512-5.


Mendelian Randomization Study on hs-CRP and Dyslipidemia in Koreans: Identification of Novel SNP rs76400217.

Huang X, Han Y, Kim M Int J Mol Sci. 2025; 26(2).

PMID: 39859220 PMC: 11764716. DOI: 10.3390/ijms26020506.


Rank-Based Greedy Model Averaging for High-Dimensional Survival Data.

He B, Ma S, Zhang X, Zhu L J Am Stat Assoc. 2024; 118(544):2658-2670.

PMID: 39552724 PMC: 11566305. DOI: 10.1080/01621459.2022.2070070.


Characterization of a ferroptosis-related gene signature predicting survival and immunotherapeutic response in lung adenocarcinoma.

Zhang C, Su Y, Wang H, Dang D, Huang X, Shi S Aging (Albany NY). 2024; 16(18):12608-12622.

PMID: 39311766 PMC: 11466487. DOI: 10.18632/aging.206110.


Identification of an 11-miRNA-regulated and surface-protein genes signature predicts the prognosis of lung adenocarcinoma based on multi-omics study.

Guo K, Qu Z, Yu Y, Zou C Am J Transl Res. 2024; 16(5):1568-1586.

PMID: 38883394 PMC: 11170602. DOI: 10.62347/CWMT4815.


References
1.
Deo R . Machine Learning in Medicine. Circulation. 2015; 132(20):1920-30. PMC: 5831252. DOI: 10.1161/CIRCULATIONAHA.115.001593. View

2.
Thunnissen E, Van der Oord K, den Bakker M . Prognostic and predictive biomarkers in lung cancer. A review. Virchows Arch. 2014; 464(3):347-58. DOI: 10.1007/s00428-014-1535-4. View

3.
Nasejje J, Mwambi H, Dheda K, Lesosky M . A comparison of the conditional inference survival forest model to random survival forests based on a simulation study as well as on two applications with time-to-event data. BMC Med Res Methodol. 2017; 17(1):115. PMC: 5534080. DOI: 10.1186/s12874-017-0383-8. View

4.
Morris K, Mattick J . The rise of regulatory RNA. Nat Rev Genet. 2014; 15(6):423-37. PMC: 4314111. DOI: 10.1038/nrg3722. View

5.
Zhu C, Tsao M . Prognostic markers in lung cancer: is it ready for prime time?. Transl Lung Cancer Res. 2015; 3(3):149-58. PMC: 4367687. DOI: 10.3978/j.issn.2218-6751.2014.06.09. View