» Articles » PMID: 17217511

Clustering of Gene Expression Data: Performance and Similarity Analysis

Overview
Publisher Biomed Central
Specialty Biology
Date 2007 Jan 16
PMID 17217511
Citations 11
Authors
Affiliations
Soon will be listed here.
Abstract

Background: DNA Microarray technology is an innovative methodology in experimental molecular biology, which has produced huge amounts of valuable data in the profile of gene expression. Many clustering algorithms have been proposed to analyze gene expression data, but little guidance is available to help choose among them. The evaluation of feasible and applicable clustering algorithms is becoming an important issue in today's bioinformatics research.

Results: In this paper we first experimentally study three major clustering algorithms: Hierarchical Clustering (HC), Self-Organizing Map (SOM), and Self Organizing Tree Algorithm (SOTA) using Yeast Saccharomyces cerevisiae gene expression data, and compare their performance. We then introduce Cluster Diff, a new data mining tool, to conduct the similarity analysis of clusters generated by different algorithms. The performance study shows that SOTA is more efficient than SOM while HC is the least efficient. The results of similarity analysis show that when given a target cluster, the Cluster Diff can efficiently determine the closest match from a set of clusters. Therefore, it is an effective approach for evaluating different clustering algorithms.

Conclusion: HC methods allow a visual, convenient representation of genes. However, they are neither robust nor efficient. The SOM is more robust against noise. A disadvantage of SOM is that the number of clusters has to be fixed beforehand. The SOTA combines the advantages of both hierarchical and SOM clustering. It allows a visual representation of the clusters and their structure and is not sensitive to noises. The SOTA is also more flexible than the other two clustering methods. By using our data mining tool, Cluster Diff, it is possible to analyze the similarity of clusters generated by different algorithms and thereby enable comparisons of different clustering methods.

Citing Articles

Automatic design of gene regulatory mechanisms for spatial pattern formation.

Mousavi R, Lobo D NPJ Syst Biol Appl. 2024; 10(1):35.

PMID: 38565850 PMC: 10987498. DOI: 10.1038/s41540-024-00361-5.


Serum microRNA as a potential biomarker for the activity of thyroid eye disease.

Kim N, Choung H, Kim Y, Woo S, Yang M, Khwarg S Sci Rep. 2023; 13(1):234.

PMID: 36604580 PMC: 9816116. DOI: 10.1038/s41598-023-27483-w.


Bioinformatic Analysis of Temporal and Spatial Proteome Alternations During Infections.

Rahmatbakhsh M, Gagarinova A, Babu M Front Genet. 2021; 12:667936.

PMID: 34276775 PMC: 8283032. DOI: 10.3389/fgene.2021.667936.


Measuring similarity between gene interaction profiles.

Barido-Sottani J, Chapman S, Kosman E, Mushegian A BMC Bioinformatics. 2019; 20(1):435.

PMID: 31438841 PMC: 6704681. DOI: 10.1186/s12859-019-3024-x.


Tumor Necrosis Factor Alpha and Insulin-Like Growth Factor 1 Induced Modifications of the Gene Expression Kinetics of Differentiating Skeletal Muscle Cells.

Meyer S, Krebs S, Thirion C, Blum H, Krause S, Pfaffl M PLoS One. 2015; 10(10):e0139520.

PMID: 26447881 PMC: 4598026. DOI: 10.1371/journal.pone.0139520.


References
1.
Herrero J, Valencia A, Dopazo J . A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics. 2001; 17(2):126-36. DOI: 10.1093/bioinformatics/17.2.126. View

2.
Dopazo J, Zanders E, Dragoni I, Amphlett G, Falciani F . Methods and approaches in the analysis of gene expression data. J Immunol Methods. 2001; 250(1-2):93-112. DOI: 10.1016/s0022-1759(01)00307-6. View

3.
Yeung K, Haynor D, Ruzzo W . Validating clustering for gene expression data. Bioinformatics. 2001; 17(4):309-18. DOI: 10.1093/bioinformatics/17.4.309. View

4.
Ramoni M, Sebastiani P, Kohane I . Cluster analysis of gene expression dynamics. Proc Natl Acad Sci U S A. 2002; 99(14):9121-6. PMC: 123104. DOI: 10.1073/pnas.132656399. View

5.
Tamames J, Clark D, Herrero J, Dopazo J, Blaschke C, Fernandez J . Bioinformatics methods for the analysis of expression arrays: data clustering and information extraction. J Biotechnol. 2002; 98(2-3):269-83. DOI: 10.1016/s0168-1656(02)00137-2. View