» Articles » PMID: 31781160

Gene Expression Value Prediction Based on XGBoost Algorithm

Overview
Journal Front Genet
Date 2019 Nov 30
PMID 31781160
Citations 75
Authors
Affiliations
Soon will be listed here.
Abstract

Gene expression profiling has been widely used to characterize cell status to reflect the health of the body, to diagnose genetic diseases, etc. In recent years, although the cost of genome-wide expression profiling is gradually decreasing, the cost of collecting expression profiles for thousands of genes is still very high. Considering gene expressions are usually highly correlated in humans, the expression values of the remaining target genes can be predicted by analyzing the values of 943 landmark genes. Hence, we designed an algorithm for predicting gene expression values based on XGBoost, which integrates multiple tree models and has stronger interpretability. We tested the performance of XGBoost model on the GEO dataset and RNA-seq dataset and compared the result with other existing models. Experiments showed that the XGBoost model achieved a significantly lower overall error than the existing D-GEX algorithm, linear regression, and KNN methods. In conclusion, the XGBoost algorithm outperforms existing models and will be a significant contribution to the toolbox for gene expression value prediction.

Citing Articles

Machine learning uncovers novel sex-specific dementia biomarkers linked to autism and eye diseases.

Khan A, Ghasemi A, Ingram K, Ay A J Alzheimers Dis Rep. 2025; 9:25424823251317177.

PMID: 40034518 PMC: 11864256. DOI: 10.1177/25424823251317177.


Predicting microRNA target genes using pan-cancer correlation patterns.

Lin S, Qiu P BMC Genomics. 2025; 26(1):77.

PMID: 39871129 PMC: 11773953. DOI: 10.1186/s12864-025-11254-0.


Intelligent in-cell electrophysiology: Reconstructing intracellular action potentials using a physics-informed deep learning model trained on nanoelectrode array recordings.

Rahmani K, Yang Y, Foster E, Tsai C, Meganathan D, Alvarez D Nat Commun. 2025; 16(1):657.

PMID: 39809732 PMC: 11733287. DOI: 10.1038/s41467-024-55571-6.


Single-Cell and Transcriptome Analysis of Periodontitis: Molecular Subtypes and Biomarkers Linked to Mitochondrial Dysfunction and Immunity.

Ma S, He H, Ren X J Inflamm Res. 2025; 17:11659-11678.

PMID: 39741754 PMC: 11687296. DOI: 10.2147/JIR.S498739.


Phenotype prediction in plants is improved by integrating large-scale transcriptomic datasets.

Wu Z, Sun Y, Zhao X, Liu Z, Zhou W, Niu Y NAR Genom Bioinform. 2024; 6(4):lqae184.

PMID: 39735343 PMC: 11672113. DOI: 10.1093/nargab/lqae184.


References
1.
Chen Y, Li Y, Narayan R, Subramanian A, Xie X . Gene expression inference with deep learning. Bioinformatics. 2016; 32(12):1832-9. PMC: 4908320. DOI: 10.1093/bioinformatics/btw074. View

2.
Edgar R, Domrachev M, Lash A . Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2001; 30(1):207-10. PMC: 99122. DOI: 10.1093/nar/30.1.207. View

3.
Penfold C, Wild D . How to infer gene networks from expression profiles, revisited. Interface Focus. 2012; 1(6):857-70. PMC: 3262295. DOI: 10.1098/rsfs.2011.0053. View

4.
Aigner T, Zien A, Gehrsitz A, Gebhard P, McKenna L . Anabolic and catabolic gene expression pattern analysis in normal versus osteoarthritic cartilage using complementary DNA-array technology. Arthritis Rheum. 2002; 44(12):2777-89. DOI: 10.1002/1529-0131(200112)44:12<2777::aid-art465>3.0.co;2-h. View

5.
Zeng X, Liao Y, Liu Y, Zou Q . Prediction and Validation of Disease Genes Using HeteSim Scores. IEEE/ACM Trans Comput Biol Bioinform. 2016; 14(3):687-695. DOI: 10.1109/TCBB.2016.2520947. View