» Articles » PMID: 17316436

Supervised Group Lasso with Applications to Microarray Data Analysis

Overview
Publisher Biomed Central
Specialty Biology
Date 2007 Feb 24
PMID 17316436
Citations 50
Authors
Affiliations
Soon will be listed here.
Abstract

Background: A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure.

Results: We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data.

Conclusion: We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods.

Citing Articles

T2-FLAIR imaging-based radiomic features for predicting early postoperative recurrence of grade II gliomas.

Wang Z, Shu J, Feng L Future Oncol. 2024; 20(35):2757-2764.

PMID: 39268928 PMC: 11572138. DOI: 10.1080/14796694.2024.2397327.


Explainable AI: A review of applications to neuroimaging data.

Farahani F, Fiok K, Lahijanian B, Karwowski W, Douglas P Front Neurosci. 2022; 16:906290.

PMID: 36583102 PMC: 9793854. DOI: 10.3389/fnins.2022.906290.


A comprehensive survey on computational learning methods for analysis of gene expression data.

Bhandari N, Walambe R, Kotecha K, Khare S Front Mol Biosci. 2022; 9:907150.

PMID: 36458095 PMC: 9706412. DOI: 10.3389/fmolb.2022.907150.


Bayesian approach for predicting responses to therapy from high-dimensional time-course gene expression profiles.

Fukushima A, Sugimoto M, Hiwa S, Hiroyasu T BMC Bioinformatics. 2021; 22(1):132.

PMID: 33736614 PMC: 7977599. DOI: 10.1186/s12859-021-04052-4.


Bayesian Hyper-LASSO Classification for Feature Selection with Application to Endometrial Cancer RNA-seq Data.

Jiang L, Greenwood C, Yao W, Li L Sci Rep. 2020; 10(1):9747.

PMID: 32546735 PMC: 7297975. DOI: 10.1038/s41598-020-66466-z.


References
1.
Yeung K, Haynor D, Ruzzo W . Validating clustering for gene expression data. Bioinformatics. 2001; 17(4):309-18. DOI: 10.1093/bioinformatics/17.4.309. View

2.
Kishino H, Waddell P . Correspondence analysis of genes and tissue types and finding genetic links from microarray data. Genome Inform Ser Workshop Genome Inform. 2001; 11:83-95. View

3.
Ma S, Huang J . Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics. 2005; 21(24):4356-62. DOI: 10.1093/bioinformatics/bti724. View

4.
Dave S, Wright G, Tan B, Rosenwald A, Gascoyne R, Chan W . Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. N Engl J Med. 2004; 351(21):2159-69. DOI: 10.1056/NEJMoa041869. View

5.
Nguyen D, Rocke D . Partial least squares proportional hazard regression for application to DNA microarray survival data. Bioinformatics. 2002; 18(12):1625-32. DOI: 10.1093/bioinformatics/18.12.1625. View