» Articles » PMID: 19079753

Machine Learning-based Receiver Operating Characteristic (ROC) Curves for Crisp and Fuzzy Classification of DNA Microarrays in Cancer Research

Overview
Date 2008 Dec 17
PMID 19079753
Citations 10
Authors
Affiliations
Soon will be listed here.
Abstract

Receiver operating characteristic (ROC) curves were generated to obtain classification area under the curve (AUC) as a function of feature standardization, fuzzification, and sample size from nine large sets of cancer-related DNA microarrays. Classifiers used included k nearest neighbor (kNN), näive Bayes classifier (NBC), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), learning vector quantization (LVQ1), logistic regression (LOG), polytomous logistic regression (PLOG), artificial neural networks (ANN), particle swarm optimization (PSO), constricted particle swarm optimization (CPSO), kernel regression (RBF), radial basis function networks (RBFN), gradient descent support vector machines (SVMGD), and least squares support vector machines (SVMLS). For each data set, AUC was determined for a number of combinations of sample size, total sum[-log(p)] of feature t-tests, with and without feature standardization and with (fuzzy) and without (crisp) fuzzification of features. Altogether, a total of 2,123,530 classification runs were made. At the greatest level of sample size, ANN resulted in a fitted AUC of 90%, while PSO resulted in the lowest fitted AUC of 72.1%. AUC values derived from 4NN were the most dependent on sample size, while PSO was the least. ANN depended the most on total statistical significance of features used based on sum[-log(p)], whereas PSO was the least dependent. Standardization of features increased AUC by 8.1% for PSO and -0.2% for QDA, while fuzzification increased AUC by 9.4% for PSO and reduced AUC by 3.8% for QDA. AUC determination in planned microarray experiments without standardization and fuzzification of features will benefit the most if CPSO is used for lower levels of feature significance (i.e., sum[-log(p)] ~ 50) and ANN is used for greater levels of significance (i.e., sum[-log(p)] ~ 500). When only standardization of features is performed, studies are likely to benefit most by using CPSO for low levels of feature statistical significance and LVQ1 for greater levels of significance. Studies involving only fuzzification of features should employ LVQ1 because of the substantial gain in AUC observed and low expense of LVQ1. Lastly, PSO resulted in significantly greater levels of AUC (89.5% average) when feature standardization and fuzzification were performed. In consideration of the data sets used and factors influencing AUC which were investigated, if low-expense computation is desired then LVQ1 is recommended. However, if computational expense is of less concern, then PSO or CPSO is recommended.

Citing Articles

Identifying Candidate Gene-Disease Associations via Graph Neural Networks.

Cinaglia P, Cannataro M Entropy (Basel). 2023; 25(6).

PMID: 37372253 PMC: 10296901. DOI: 10.3390/e25060909.


Computation of the distribution of model accuracy statistics in machine learning: Comparison between analytically derived distributions and simulation-based methods.

Huang A, Huang S Health Sci Rep. 2023; 6(4):e1214.

PMID: 37091362 PMC: 10119581. DOI: 10.1002/hsr2.1214.


A comprehensive survey on computational learning methods for analysis of gene expression data.

Bhandari N, Walambe R, Kotecha K, Khare S Front Mol Biosci. 2022; 9:907150.

PMID: 36458095 PMC: 9706412. DOI: 10.3389/fmolb.2022.907150.


Predictions from algorithmic modeling result in better decisions than from data modeling for soybean iron deficiency chlorosis.

Xu Z, Kurek A, Cannon S, Beavis W PLoS One. 2021; 16(7):e0240948.

PMID: 34242220 PMC: 8270216. DOI: 10.1371/journal.pone.0240948.


3-Dimensional facial expression recognition in human using multi-points warping.

Agbolade O, Nazri A, Yaakob R, Ghani A, Cheah Y BMC Bioinformatics. 2019; 20(1):619.

PMID: 31791234 PMC: 6889223. DOI: 10.1186/s12859-019-3153-2.


References
1.
Wei C, Li J, Bumgarner R . Sample size for detecting differentially expressed genes in microarray experiments. BMC Genomics. 2004; 5:87. PMC: 533874. DOI: 10.1186/1471-2164-5-87. View

2.
Singh D, Febbo P, Ross K, Jackson D, Manola J, Ladd C . Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 2002; 1(2):203-9. DOI: 10.1016/s1535-6108(02)00030-2. View

3.
J van t Veer L, Dai H, van de Vijver M, He Y, Hart A, Mao M . Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002; 415(6871):530-6. DOI: 10.1038/415530a. View

4.
Hwang D, Schmitt W, Stephanopoulos G, Stephanopoulos G . Determination of minimum sample size and discriminatory expression patterns in microarray data. Bioinformatics. 2002; 18(9):1184-93. DOI: 10.1093/bioinformatics/18.9.1184. View

5.
Jung S, Bang H, Young S . Sample size calculation for multiple testing in microarray data analysis. Biostatistics. 2004; 6(1):157-69. DOI: 10.1093/biostatistics/kxh026. View