» Articles » PMID: 29166852

Unsupervised Gene Selection Using Biological Knowledge : Application in Sample Clustering

Overview
Publisher Biomed Central
Specialty Biology
Date 2017 Nov 24
PMID 29166852
Citations 9
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Classification of biological samples of gene expression data is a basic building block in solving several problems in the field of bioinformatics like cancer and other disease diagnosis and making a proper treatment plan. One big challenge in sample classification is handling large dimensional and redundant gene expression data. To reduce the complexity of handling this high dimensional data, gene/feature selection plays a major role.

Results: The current paper explores the use of biological knowledge acquired from Gene Ontology database in selecting the proper subset of genes which can further participate in clustering of samples. The proposed feature selection technique is unsupervised in nature as it does not utilize any class label information in the process of gene selection. At the end, a multi-objective clustering approach is deployed to cluster the available set of samples in the reduced gene space.

Conclusions: Reported results show that consideration of biological knowledge in gene selection technique not only reduces the feature space dimensionality in great extent but also improves the accuracy of sample classification. The obtained reduced gene space is validated using strong biological significance tests. In order to prove the supremacy of our proposed gene selection based sample clustering technique, a thorough comparative analysis has also been performed with state-of-the-art techniques.

Citing Articles

Biologically weighted LASSO: enhancing functional interpretability in gene expression data analysis.

Mongardi S, Cascianelli S, Masseroli M Bioinformatics. 2024; 40(10).

PMID: 39412436 PMC: 11639179. DOI: 10.1093/bioinformatics/btae605.


CogNet: classification of gene expression data based on ranked active-subnetwork-oriented KEGG pathway enrichment analysis.

Yousef M, Ulgen E, Sezerman O PeerJ Comput Sci. 2021; 7:e336.

PMID: 33816987 PMC: 7959595. DOI: 10.7717/peerj-cs.336.


Multi-view feature selection for identifying gene markers: a diversified biological data driven approach.

Acharya S, Cui L, Pan Y BMC Bioinformatics. 2020; 21(Suppl 18):483.

PMID: 33375940 PMC: 7772934. DOI: 10.1186/s12859-020-03810-0.


Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data.

Yousef M, Kumar A, Bakir-Gungor B Entropy (Basel). 2020; 23(1).

PMID: 33374969 PMC: 7821996. DOI: 10.3390/e23010002.


Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions.

Mahendran N, Vincent P, Srinivasan K, Chang C Front Genet. 2020; 11:603808.

PMID: 33362861 PMC: 7758324. DOI: 10.3389/fgene.2020.603808.


References
1.
Wolting C, McGlade C, Tritchler D . Cluster analysis of protein array results via similarity of Gene Ontology annotation. BMC Bioinformatics. 2006; 7:338. PMC: 1539024. DOI: 10.1186/1471-2105-7-338. View

2.
Chagoyen M, Carmona-Saez P, Gil C, Carazo J, Pascual-Montano A . A literature-based similarity metric for biological processes. BMC Bioinformatics. 2006; 7:363. PMC: 1579237. DOI: 10.1186/1471-2105-7-363. View

3.
Bezdek J, Pal N . Some new indexes of cluster validity. IEEE Trans Syst Man Cybern B Cybern. 2008; 28(3):301-15. DOI: 10.1109/3477.678624. View

4.
Chandra B, Gupta M . An efficient statistical feature selection approach for classification of gene expression data. J Biomed Inform. 2011; 44(4):529-35. DOI: 10.1016/j.jbi.2011.01.001. View

5.
Paul S, Maji P . City block distance and rough-fuzzy clustering for identification of co-expressed microRNAs. Mol Biosyst. 2014; 10(6):1509-23. DOI: 10.1039/c4mb00101j. View