» Articles » PMID: 12952885

OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes

Overview
Journal Genome Res
Specialty Genetics
Date 2003 Sep 4
PMID 12952885
Citations 3411
Authors
Affiliations
Soon will be listed here.
Abstract

The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of "recent" paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome.

Citing Articles

Revealing Genomic Traits and Evolutionary Insights of Oryza officinalis from Southern China Through Genome Assembly and Transcriptome Analysis.

Chen C, Hu H, Guo H, Xia X, Zhang Z, Nong B Rice (N Y). 2025; 18(1):15.

PMID: 40082317 PMC: 11906960. DOI: 10.1186/s12284-025-00769-5.


Revisiting the druggable genome using predicted structures and data mining.

Godinez-Macias K, Chen D, Wallis J, Siegel M, Adam A, Bopp S NPJ Drug Discov. 2025; 2(1):3.

PMID: 40066064 PMC: 11892419. DOI: 10.1038/s44386-025-00006-5.


Multiomics analysis provides insights into musk secretion in muskrat and musk deer.

Wang T, Yang M, Shi X, Tian S, Li Y, Xie W Gigascience. 2025; 14.

PMID: 40036429 PMC: 11878540. DOI: 10.1093/gigascience/giaf006.


The knockout of Gγ subunit HvGS3 by CRISPR/Cas9 gene editing improves the lodging resistance of barley through dwarfing and stem strengthening.

Jiang Y, Xue R, Chang Y, Cao D, Liu B, Li Y Theor Appl Genet. 2025; 138(3):61.

PMID: 40014102 DOI: 10.1007/s00122-025-04853-8.


Properties of "Stable" Mosquito Cytochrome P450 Enzymes.

Tzotzos G Insects. 2025; 16(2).

PMID: 40003814 PMC: 11855896. DOI: 10.3390/insects16020184.


References
1.
Wheelan S, Boguski M, Duret L, Makalowski W . Human and nematode orthologs--lessons from the analysis of 1800 human genes and the proteome of Caenorhabditis elegans. Gene. 1999; 238(1):163-70. DOI: 10.1016/s0378-1119(99)00298-x. View

2.
Chervitz S, Aravind L, Sherlock G, Ball C, Koonin E, Dwight S . Comparison of the complete protein sets of worm and yeast: orthology and divergence. Science. 1998; 282(5396):2022-8. PMC: 3057080. DOI: 10.1126/science.282.5396.2022. View

3.
Tatusov R, Galperin M, Natale D, Koonin E . The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 1999; 28(1):33-6. PMC: 102395. DOI: 10.1093/nar/28.1.33. View

4.
Quackenbush J, Liang F, Holt I, Pertea G, Upton J . The TIGR gene indices: reconstruction and representation of expressed gene sequences. Nucleic Acids Res. 1999; 28(1):141-5. PMC: 102391. DOI: 10.1093/nar/28.1.141. View

5.
Galperin M, Koonin E . Searching for drug targets in microbial genomes. Curr Opin Biotechnol. 1999; 10(6):571-8. DOI: 10.1016/s0958-1669(99)00035-x. View