Use of Semisupervised Clustering and Feature-Selection Techniques for Identification of Co-expressed Genes

Overview

Journal IEEE J Biomed Health Inform

Specialties Biomedical Engineering
Medical Informatics

Date 2015 Jul 25

PMID 26208367

Citations 2

Authors

Sriparna Saha

Abhay Kumar Alok

Asif Ekbal

Affiliations

Soon will be listed here.

Abstract

Studying the patterns hidden in gene-expression data helps to understand the functionality of genes. In general, clustering techniques are widely used for the identification of natural partitionings from the gene expression data. In order to put constraints on dimensionality, feature selection is the key issue because not all features are important from clustering point of view. Moreover some limited amount of supervised information can help to fine tune the obtained clustering solution. In this paper, the problem of simultaneous feature selection and semisupervised clustering is formulated as a multiobjective optimization (MOO) task. A modern simulated annealing-based MOO technique namely AMOSA is utilized as the background optimization methodology. Here, features and cluster centers are represented in the form of a string and the assignment of genes to different clusters is done using a point symmetry-based distance. Six optimization criteria based on several internal and external cluster validity indices are utilized. In order to generate the supervised information, a popular clustering technique, Fuzzy C-mean, is utilized. Appropriate subset of features, proper number of clusters and the proper partitioning are determined using the search capability of AMOSA. The effectiveness of this proposed semisupervised clustering technique, Semi-FeaClustMOO, is demonstrated on five publicly available benchmark gene-expression datasets. Comparison results with the existing techniques for gene-expression data clustering again reveal the superiority of the proposed technique. Statistical and biological significance tests have also been carried out.

Citing Articles

A multi-objective gene clustering algorithm guided by apriori biological knowledge with intensification and diversification strategies.

Parraga-Alava J, Dorn M, Inostroza-Ponta M BioData Min. 2018; 11:16.

PMID: 30100924 PMC: 6081857. DOI: 10.1186/s13040-018-0178-4.

Unsupervised gene selection using biological knowledge : application in sample clustering.

Acharya S, Saha S, Nikhil N BMC Bioinformatics. 2017; 18(1):513.

PMID: 29166852 PMC: 5700545. DOI: 10.1186/s12859-017-1933-0.