» Articles » PMID: 25574125

Overcome Support Vector Machine Diagnosis Overfitting

Overview
Journal Cancer Inform
Publisher Sage Publications
Date 2015 Jan 10
PMID 25574125
Citations 37
Authors
Affiliations
Soon will be listed here.
Abstract

Support vector machines (SVMs) are widely employed in molecular diagnosis of disease for their efficiency and robustness. However, there is no previous research to analyze their overfitting in high-dimensional omics data based disease diagnosis, which is essential to avoid deceptive diagnostic results and enhance clinical decision making. In this work, we comprehensively investigate this problem from both theoretical and practical standpoints to unveil the special characteristics of SVM overfitting. We found that disease diagnosis under an SVM classifier would inevitably encounter overfitting under a Gaussian kernel because of the large data variations generated from high-throughput profiling technologies. Furthermore, we propose a novel sparse-coding kernel approach to overcome SVM overfitting in disease diagnosis. Unlike traditional ad-hoc parametric tuning approaches, it not only robustly conquers the overfitting problem, but also achieves good diagnostic accuracy. To our knowledge, it is the first rigorous method proposed to overcome SVM overfitting. Finally, we propose a novel biomarker discovery algorithm: Gene-Switch-Marker (GSM) to capture meaningful biomarkers by taking advantage of SVM overfitting on single genes.

Citing Articles

Identifying discriminative features of brain network for prediction of Alzheimer's disease using graph theory and machine learning.

Karim S, Fahad M, Rathore R Front Neuroinform. 2024; 18:1384720.

PMID: 38957548 PMC: 11217540. DOI: 10.3389/fninf.2024.1384720.


Intracranial EEG signals disentangle multi-areal neural dynamics of vicarious pain perception.

Tan H, Zeng X, Ni J, Liang K, Xu C, Zhang Y Nat Commun. 2024; 15(1):5203.

PMID: 38890380 PMC: 11189531. DOI: 10.1038/s41467-024-49541-1.


Morphological Species Delimitation in The Western Pond Turtle (): Can Machine Learning Methods Aid in Cryptic Species Identification?.

Burroughs R, Parham J, Stuart B, Smits P, Angielczyk K Integr Org Biol. 2024; 6(1):obae010.

PMID: 38689939 PMC: 11058871. DOI: 10.1093/iob/obae010.


Super-Enhancers and Their Parts: From Prediction Efforts to Pathognomonic Status.

Vasileva A, Gladkova M, Ashniev G, Osintseva E, Orlov A, Kravchuk E Int J Mol Sci. 2024; 25(6).

PMID: 38542080 PMC: 10969950. DOI: 10.3390/ijms25063103.


Machine learning-based algorithms applied to drug prescriptions and other healthcare services in the Sicilian claims database to identify acromegaly as a model for the earlier diagnosis of rare diseases.

Crisafulli S, Fontana A, LAbbate L, Vitturi G, Cozzolino A, Gianfrilli D Sci Rep. 2024; 14(1):6186.

PMID: 38485706 PMC: 10940660. DOI: 10.1038/s41598-024-56240-w.


References
1.
Boersma B, Reimers M, Yi M, Ludwig J, Luke B, Stephens R . A stromal gene signature associated with inflammatory breast cancer. Int J Cancer. 2007; 122(6):1324-32. DOI: 10.1002/ijc.23237. View

2.
Buitrago-Perez A, Garaulet G, Vazquez-Carballo A, Paramio J, Garcia-Escudero R . Molecular Signature of HPV-Induced Carcinogenesis: pRb, p53 and Gene Expression Profiling. Curr Genomics. 2009; 10(1):26-34. PMC: 2699838. DOI: 10.2174/138920209787581235. View

3.
Conrads T, Fusaro V, Ross S, Johann D, Rajapakse V, Hitt B . High-resolution serum proteomic features for ovarian cancer detection. Endocr Relat Cancer. 2004; 11(2):163-78. DOI: 10.1677/erc.0.0110163. View

4.
Nguyen D, Rocke D . Tumor classification by partial least squares using microarray gene expression data. Bioinformatics. 2002; 18(1):39-50. DOI: 10.1093/bioinformatics/18.1.39. View

5.
Yu K, Lee C, Tan P, Tan P . Conservation of breast cancer molecular subtypes and transcriptional patterns of tumor progression across distinct ethnic populations. Clin Cancer Res. 2004; 10(16):5508-17. DOI: 10.1158/1078-0432.CCR-04-0085. View