Variable Selection for Model-based High-dimensional Clustering and Its Application to Microarray Data

Overview

Journal Biometrics

Publisher Oxford University Press

Specialty Public Health

Date 2007 Nov 1

PMID 17970821

Citations 26

Authors

Sijian Wang

Ji Zhu

Affiliations

Soon will be listed here.

Abstract

Variable selection in high-dimensional clustering analysis is an important yet challenging problem. In this article, we propose two methods that simultaneously separate data points into similar clusters and select informative variables that contribute to the clustering. Our methods are in the framework of penalized model-based clustering. Unlike the classical L(1)-norm penalization, the penalty terms that we propose make use of the fact that parameters belonging to one variable should be treated as a natural "group." Numerical results indicate that the two new methods tend to remove noninformative variables more effectively and provide better clustering results than the L(1)-norm approach.

Citing Articles

Sparse kernel -means clustering.

Park B, Park C, Hong S, Choi H J Appl Stat. 2025; 52(1):158-182.

PMID: 39811085 PMC: 11727190. DOI: 10.1080/02664763.2024.2362266.

Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data.

Wang M, Allen G J Mach Learn Res. 2021; 22.

PMID: 34744522 PMC: 8570363.

Cancer stem cell transcriptome landscape reveals biomarkers driving breast carcinoma heterogeneity.

Zhang Z, Chen X, Zhang J, Dai X Breast Cancer Res Treat. 2021; 186(1):89-98.

PMID: 33389402 DOI: 10.1007/s10549-020-06045-y.

Discovering a sparse set of pairwise discriminating features in high-dimensional data.

Melton S, Ramanathan S Bioinformatics. 2020; 37(2):202-212.

PMID: 32730566 PMC: 8599814. DOI: 10.1093/bioinformatics/btaa690.

DNA methylation profiles capturing breast cancer heterogeneity.

Chen X, Zhang J, Dai X BMC Genomics. 2019; 20(1):823.

PMID: 31699026 PMC: 6839140. DOI: 10.1186/s12864-019-6142-y.