» Articles » PMID: 24564516

Quantitative Synteny Scoring Improves Homology Inference and Partitioning of Gene Families

Overview
Publisher Biomed Central
Specialty Biology
Date 2014 Feb 26
PMID 24564516
Citations 2
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Clustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential.

Results: Here, we present GenFamClust, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families. GenFamClust identifies homologs in a more informed and accurate manner as compared to similarity based approaches. We tested our method against the Neighborhood Correlation method on two diverse datasets consisting of fully sequenced genomes of eukaryotes and synthetic data.

Conclusions: The results obtained from both datasets confirm that synteny helps determine homology and GenFamClust improves on Neighborhood Correlation method. The accuracy as well as the definition of synteny scores is the most valuable contribution of GenFamClust.

Citing Articles

B Cell Receptor Activation Predominantly Regulates AKT-mTORC1/2 Substrates Functionally Related to RNA Processing.

Mohammad D, Ali R, Turunen J, Nore B, Smith C PLoS One. 2016; 11(8):e0160255.

PMID: 27487157 PMC: 4972398. DOI: 10.1371/journal.pone.0160255.


GenFamClust: an accurate, synteny-aware and reliable homology inference algorithm.

Ali R, Muhammad S, Arvestad L BMC Evol Biol. 2016; 16(1):120.

PMID: 27260514 PMC: 4893229. DOI: 10.1186/s12862-016-0684-2.

References
1.
Remm M, Storm C, Sonnhammer E . Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 2001; 314(5):1041-52. DOI: 10.1006/jmbi.2000.5197. View

2.
Song N, Joseph J, Davis G, Durand D . Sequence similarity network reveals common ancestry of multidomain proteins. PLoS Comput Biol. 2008; 4(4):e1000063. PMC: 2377100. DOI: 10.1371/journal.pcbi.1000063. View

3.
Friedman R, Hughes A . Gene duplication and the structure of eukaryotic genomes. Genome Res. 2001; 11(3):373-81. PMC: 311031. DOI: 10.1101/gr.155801. View

4.
Tatusov R, Koonin E, Lipman D . A genomic perspective on protein families. Science. 1997; 278(5338):631-7. DOI: 10.1126/science.278.5338.631. View

5.
Luc N, Risler J, Bergeron A, Raffinot M . Gene teams: a new formalization of gene clusters for comparative genomics. Comput Biol Chem. 2003; 27(1):59-67. DOI: 10.1016/s1476-9271(02)00097-x. View