» Articles » PMID: 20439257

A Low-polynomial Algorithm for Assembling Clusters of Orthologous Groups from Intergenomic Symmetric Best Matches

Overview
Journal Bioinformatics
Specialty Biology
Date 2010 May 5
PMID 20439257
Citations 137
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Identifying orthologous genes in multiple genomes is a fundamental task in comparative genomics. Construction of intergenomic symmetrical best matches (SymBets) and joining them into clusters is a popular method of ortholog definition, embodied in several software programs. Despite their wide use, the computational complexity of these programs has not been thoroughly examined.

Results: In this work, we show that in the standard approach of iteration through all triangles of SymBets, the memory scales with at least the number of these triangles, O(g(3)) (where g = number of genomes), and construction time scales with the iteration through each pair, i.e. O(g(6)). We propose the EdgeSearch algorithm that iterates over edges in the SymBet graph rather than triangles of SymBets, and as a result has a worst-case complexity of only O(g(3)log g). Several optimizations reduce the run-time even further in realistically sparse graphs. In two real-world datasets of genomes from bacteriophages (POGs) and Mollicutes (MOGs), an implementation of the EdgeSearch algorithm runs about an order of magnitude faster than the original algorithm and scales much better with increasing number of genomes, with only minor differences in the final results, and up to 60 times faster than the popular OrthoMCL program with a 90% overlap between the identified groups of orthologs.

Availability And Implementation: C++ source code freely available for download at ftp.ncbi.nih.gov/pub/wolf/COGs/COGsoft/.

Supplementary Information: Supplementary materials are available at Bioinformatics online.

Citing Articles

Microbial Rumen proteome analysis suggests Firmicutes and Bacteroidetes as key producers of lignocellulolytic enzymes and carbohydrate-binding modules.

da Silva Pereira M, Alcantara L, Martins de Freitas L, de Oliveira Ferreira A, Lopes Leal P Braz J Microbiol. 2025; .

PMID: 39932663 DOI: 10.1007/s42770-025-01627-8.


A sweeping view of avian mycoplasmas biology drawn from comparative genomic analyses.

Yacoub E, Baby V, Sirand-Pugnet P, Arfi Y, Mardassi H, Blanchard A BMC Genomics. 2025; 26(1):24.

PMID: 39789465 PMC: 11720521. DOI: 10.1186/s12864-024-11201-5.


Pangenome-Wide Association Study in the Family Reveals Key Evolutionary Aspects of Their Relationship with Their Hosts.

Salgado-Morales R, Barba-Xochipa K, Martinez-Ocampo F, Dantan-Gonzalez E, Hernandez-Mendoza A, Quiterio-Trenado M Int J Mol Sci. 2024; 25(23).

PMID: 39684382 PMC: 11641800. DOI: 10.3390/ijms252312671.


Description of two novel species isolated from human nasal passages and skin.

Popowitch E, Tran T, Escapa I, Bhatt E, Sozat A, Ahmed N bioRxiv. 2024; .

PMID: 39605597 PMC: 11601546. DOI: 10.1101/2024.11.21.624533.


Metabolic capabilities are highly conserved among human nasal-associated species in pangenomic analyses.

Tran T, F Escapa I, Roberts A, Gao W, Obawemimo A, Segre J mSystems. 2024; 9(12):e0113224.

PMID: 39508593 PMC: 11651106. DOI: 10.1128/msystems.01132-24.


References
1.
Remm M, Storm C, Sonnhammer E . Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 2001; 314(5):1041-52. DOI: 10.1006/jmbi.2000.5197. View

2.
Koonin E . Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet. 2005; 39:309-38. DOI: 10.1146/annurev.genet.39.073003.114725. View

3.
Forest F . Calibrating the Tree of Life: fossils, molecules and evolutionary timescales. Ann Bot. 2009; 104(5):789-94. PMC: 2749537. DOI: 10.1093/aob/mcp192. View

4.
Jensen L, Julien P, Kuhn M, von Mering C, Muller J, Doerks T . eggNOG: automated construction and annotation of orthologous groups of genes. Nucleic Acids Res. 2007; 36(Database issue):D250-4. PMC: 2238944. DOI: 10.1093/nar/gkm796. View

5.
Vilella A, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E . EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2008; 19(2):327-35. PMC: 2652215. DOI: 10.1101/gr.073585.107. View