» Articles » PMID: 25738861

A Regression-based Differential Expression Detection Algorithm for Microarray Studies with Ultra-low Sample Size

Overview
Journal PLoS One
Date 2015 Mar 5
PMID 25738861
Citations 5
Authors
Affiliations
Soon will be listed here.
Abstract

Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED). Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality.

Citing Articles

Comparative analysis of tissue-specific genes in maize based on machine learning models: CNN performs technically best, LightGBM performs biologically soundest.

Wang Z, Zhu Y, Liu Z, Li H, Tang X, Jiang Y Front Genet. 2023; 14:1190887.

PMID: 37229198 PMC: 10203421. DOI: 10.3389/fgene.2023.1190887.


The Gene Family: From Embryo to Disease.

Nalamalapu R, Yue M, Stone A, Murphy S, Saha M Front Mol Neurosci. 2021; 14:672511.

PMID: 34262434 PMC: 8273234. DOI: 10.3389/fnmol.2021.672511.


Xenopus embryos show a compensatory response following perturbation of the Notch signaling pathway.

Solini G, Pownall M, Hillenbrand M, Tocheny C, Paudel S, Halleran A Dev Biol. 2020; 460(2):99-107.

PMID: 31899211 PMC: 7263880. DOI: 10.1016/j.ydbio.2019.12.016.


Genomic signature of parity in the breast of premenopausal women.

Santucci-Pereira J, Zeleniuch-Jacquotte A, Afanasyeva Y, Zhong H, Slifker M, Peri S Breast Cancer Res. 2019; 21(1):46.

PMID: 30922380 PMC: 6438043. DOI: 10.1186/s13058-019-1128-x.


Automated Classification of Benign and Malignant Proliferative Breast Lesions.

Radiya-Dixit E, Zhu D, Beck A Sci Rep. 2017; 7(1):9900.

PMID: 28852119 PMC: 5575012. DOI: 10.1038/s41598-017-10324-y.

References
1.
Louvi A, Artavanis-Tsakonas S . Notch and disease: a growing field. Semin Cell Dev Biol. 2012; 23(4):473-80. PMC: 4369912. DOI: 10.1016/j.semcdb.2012.02.005. View

2.
Papp K, Szittner Z, Prechl J . Life on a microarray: assessing live cell functions in a microarray format. Cell Mol Life Sci. 2012; 69(16):2717-25. PMC: 11115177. DOI: 10.1007/s00018-012-0947-z. View

3.
Breheny P, Huang J . COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. Ann Appl Stat. 2011; 5(1):232-253. PMC: 3212875. DOI: 10.1214/10-AOAS388. View

4.
Friedman J, Hastie T, Tibshirani R . Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010; 33(1):1-22. PMC: 2929880. View

5.
Murie C, Woody O, Lee A, Nadon R . Comparison of small n statistical tests of differential expression applied to microarrays. BMC Bioinformatics. 2009; 10:45. PMC: 2674054. DOI: 10.1186/1471-2105-10-45. View