» Articles » PMID: 30100924

A Multi-objective Gene Clustering Algorithm Guided by Apriori Biological Knowledge with Intensification and Diversification Strategies

Overview
Journal BioData Min
Publisher Biomed Central
Specialty Biology
Date 2018 Aug 14
PMID 30100924
Citations 2
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Biologists aim to understand the genetic background of diseases, metabolic disorders or any other genetic condition. Microarrays are one of the main high-throughput technologies for collecting information about the behaviour of genetic information on different conditions. In order to analyse this data, clustering arises as one of the main techniques used, and it aims at finding groups of genes that have some criterion in common, like similar expression profile. However, the problem of finding groups is normally multi dimensional, making necessary to approach the clustering as a multi-objective problem where various cluster validity indexes are simultaneously optimised. They are usually based on criteria like compactness and separation, which may not be sufficient since they can not guarantee the generation of clusters that have both similar expression patterns and biological coherence.

Method: We propose a Multi-Objective Clustering algorithm Guided by a-Priori Biological Knowledge (MOC-GaPBK) to find clusters of genes with high levels of co-expression, biological coherence, and also good compactness and separation. Cluster quality indexes are used to optimise simultaneously gene relationships at expression level and biological functionality. Our proposal also includes intensification and diversification strategies to improve the search process.

Results: The effectiveness of the proposed algorithm is demonstrated on four publicly available datasets. Comparative studies of the use of different objective functions and other widely used microarray clustering techniques are reported. Statistical, visual and biological significance tests are carried out to show the superiority of the proposed algorithm.

Conclusions: Integrating a-priori biological knowledge into a multi-objective approach and using intensification and diversification strategies allow the proposed algorithm to find solutions with higher quality than other microarray clustering techniques available in the literature in terms of co-expression, biological coherence, compactness and separation.

Citing Articles

A hybrid multi-objective whale optimization algorithm for analyzing microarray data based on Apache Spark.

AbdelAziz A, Soliman T, Ghany K, Sewisy A PeerJ Comput Sci. 2021; 7:e416.

PMID: 33834101 PMC: 8022636. DOI: 10.7717/peerj-cs.416.


RoCoLe: A coffee leaf images dataset for evaluation of machine learning based methods in plant diseases recognition.

Parraga-Alava J, Cusme K, Loor A, Santander E Data Brief. 2019; 25:104414.

PMID: 31516934 PMC: 6727496. DOI: 10.1016/j.dib.2019.104414.

References
1.
Vaes E, Khan M, Mombaerts P . Statistical analysis of differential gene expression relative to a fold change threshold on NanoString data of mouse odorant receptor genes. BMC Bioinformatics. 2014; 15:39. PMC: 4016238. DOI: 10.1186/1471-2105-15-39. View

2.
Jang H, Chung H, Ko E, Shin J, Shin M, Hong M . Microarray analysis of gene expression profiles in response to treatment with bee venom in lipopolysaccharide activated RAW 264.7 cells. J Ethnopharmacol. 2008; 121(2):213-20. DOI: 10.1016/j.jep.2008.09.006. View

3.
Saha S, Alok A, Ekbal A . Use of Semisupervised Clustering and Feature-Selection Techniques for Identification of Co-expressed Genes. IEEE J Biomed Health Inform. 2015; 20(4):1171-7. DOI: 10.1109/JBHI.2015.2451735. View

4.
Saha S, Ekbal A, Gupta K, Bandyopadhyay S . Gene expression data clustering using a multiobjective symmetry based clustering technique. Comput Biol Med. 2013; 43(11):1965-77. DOI: 10.1016/j.compbiomed.2013.07.021. View

5.
Handl J, Kell D, Knowles J . Multiobjective optimization in bioinformatics and computational biology. IEEE/ACM Trans Comput Biol Bioinform. 2007; 4(2):279-92. DOI: 10.1109/TCBB.2007.070203. View