» Articles » PMID: 16078370

Noise-robust Soft Clustering of Gene Expression Time-course Data

Overview
Specialty Biology
Date 2005 Aug 4
PMID 16078370
Citations 214
Authors
Affiliations
Soon will be listed here.
Abstract

Clustering is an important tool in microarray data analysis. This unsupervised learning technique is commonly used to reveal structures hidden in large gene expression data sets. The vast majority of clustering algorithms applied so far produce hard partitions of the data, i.e. each gene is assigned exactly to one cluster. Hard clustering is favourable if clusters are well separated. However, this is generally not the case for microarray time-course data, where gene clusters frequently overlap. Additionally, hard clustering algorithms are often highly sensitive to noise. To overcome the limitations of hard clustering, we applied soft clustering which offers several advantages for researchers. First, it generates accessible internal cluster structures, i.e. it indicates how well corresponding clusters represent genes. This can be used for the more targeted search for regulatory elements. Second, the overall relation between clusters, and thus a global clustering structure, can be defined. Additionally, soft clustering is more noise robust and a priori pre-filtering of genes can be avoided. This prevents the exclusion of biologically relevant genes from the data analysis. Soft clustering was implemented here using the fuzzy c-means algorithm. Procedures to find optimal clustering parameters were developed. A software package for soft clustering has been developed based on the open-source statistical language R. The package called Mfuzz is freely available.

Citing Articles

Combined transcriptomic and proteomic analyses reveal relevant myelin features in mice with ischemic stroke.

Qian Q, Lyu H, Wang W, Wang Q, Li D, Liu X Funct Integr Genomics. 2025; 25(1):64.

PMID: 40085348 DOI: 10.1007/s10142-025-01573-6.


PRKD2 as a novel target for targeting the diabetes-osteoporosis nexus.

Chen R, Yang C, Xiao H, Yang A, Chen C, Yang F Sci Rep. 2025; 15(1):4703.

PMID: 39922871 PMC: 11807170. DOI: 10.1038/s41598-025-89235-2.


Cardiac repair using regenerating neonatal heart tissue-derived extracellular vesicles.

Li H, Liu Y, Lin Y, Li S, Liu C, Cai A Nat Commun. 2025; 16(1):1292.

PMID: 39900896 PMC: 11790877. DOI: 10.1038/s41467-025-56384-x.


Molecular mechanisms and comparative transcriptomics of diapause in two corn rootworm species ( spp.).

Lecheta M, Nielson C, French B, Nadeau E, Teets N Curr Res Insect Sci. 2025; 7:100104.

PMID: 39895870 PMC: 11786089. DOI: 10.1016/j.cris.2024.100104.


LcProt: Proteomics-based identification of plasma biomarkers for lung cancer multievent, a multicentre study.

Liang H, Wang R, Cheng R, Ye Z, Zhao N, Zhao X Clin Transl Med. 2025; 15(1):e70160.

PMID: 39783847 PMC: 11714244. DOI: 10.1002/ctm2.70160.