» Articles » PMID: 24885064

Improvement of Domain-level Ortholog Clustering by Optimizing Domain-specific Sum-of-pairs Score

Overview
Publisher Biomed Central
Specialty Biology
Date 2014 Jun 3
PMID 24885064
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Identification of ortholog groups is a crucial step in comparative analysis of multiple genomes. Although several computational methods have been developed to create ortholog groups, most of those methods do not evaluate orthology at the sub-gene level. In our method for domain-level ortholog clustering, DomClust, proteins are split into domains on the basis of alignment boundaries identified by all-against-all pairwise comparison, but it often fails to determine appropriate boundaries.

Results: We developed a method to improve domain-level ortholog classification using multiple alignment information. This method is based on a scoring scheme, the domain-specific sum-of-pairs (DSP) score, which evaluates ortholog clustering results at the domain level as the sum total of domain-level alignment scores. We developed a refinement pipeline to improve domain-level clustering, DomRefine, by optimizing the DSP score. We applied DomRefine to domain-level ortholog groups created by DomClust using a dataset obtained from the Microbial Genome Database for Comparative Analysis (MBGD), and evaluated the results using COG clusters and TIGRFAMs models as the reference data. Thus, we observed that the agreement between the resulting classification and the classifications in the reference databases is improved at almost every step in the refinement pipeline. Moreover, the refined classification showed better agreement than the classifications in the eggNOG databases when TIGRFAMs was used as the reference database.

Conclusions: DomRefine is a useful tool for improving the quality of domain-level ortholog classification among microbial genomes. Combining with a rapid domain-level ortholog clustering method, such as DomClust, it can be used to create a high-quality ortholog database that can serve as a solid basis for various comparative genome analyses.

Citing Articles

Ten Years of Collaborative Progress in the Quest for Orthologs.

Linard B, Ebersberger I, McGlynn S, Glover N, Mochizuki T, Patricio M Mol Biol Evol. 2021; 38(8):3033-3045.

PMID: 33822172 PMC: 8321534. DOI: 10.1093/molbev/msab098.


The Quest for Orthologs benchmark service and consensus calls in 2020.

Altenhoff A, Garrayo-Ventas J, Cosentino S, Emms D, Glover N, Hernandez-Plaza A Nucleic Acids Res. 2020; 48(W1):W538-W545.

PMID: 32374845 PMC: 7319555. DOI: 10.1093/nar/gkaa308.


MBGD update 2018: microbial genome database based on hierarchical orthology relations covering closely related and distantly related comparisons.

Uchiyama I, Mihara M, Nishide H, Chiba H, Kato M Nucleic Acids Res. 2018; 47(D1):D382-D389.

PMID: 30462302 PMC: 6324027. DOI: 10.1093/nar/gky1054.


Inferring Orthologs: Open Questions and Perspectives.

Tekaia F Genomics Insights. 2016; 9:17-28.

PMID: 26966373 PMC: 4778853. DOI: 10.4137/GEI.S37925.


Construction of an ortholog database using the semantic web technology for integrative analysis of genomic data.

Chiba H, Nishide H, Uchiyama I PLoS One. 2015; 10(4):e0122802.

PMID: 25875762 PMC: 4395280. DOI: 10.1371/journal.pone.0122802.


References
1.
Uchiyama I . Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes. Nucleic Acids Res. 2006; 34(2):647-58. PMC: 1351371. DOI: 10.1093/nar/gkj448. View

2.
Wang L, Jiang T . On the complexity of multiple sequence alignment. J Comput Biol. 1994; 1(4):337-48. DOI: 10.1089/cmb.1994.1.337. View

3.
Dessimoz C, Boeckmann B, Roth A, Gonnet G . Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits. Nucleic Acids Res. 2006; 34(11):3309-16. PMC: 1500873. DOI: 10.1093/nar/gkl433. View

4.
Thompson J, Plewniak F, Poch O . A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 1999; 27(13):2682-90. PMC: 148477. DOI: 10.1093/nar/27.13.2682. View

5.
Huerta-Cepas J, Bueno A, Dopazo J, Gabaldon T . PhylomeDB: a database for genome-wide collections of gene phylogenies. Nucleic Acids Res. 2007; 36(Database issue):D491-6. PMC: 2238872. DOI: 10.1093/nar/gkm899. View