» Articles » PMID: 11178258

Towards Understanding the First Genome Sequence of a Crenarchaeon by Genome Annotation Using Clusters of Orthologous Groups of Proteins (COGs)

Overview
Journal Genome Biol
Specialties Biology
Genetics
Date 2001 Feb 24
PMID 11178258
Citations 49
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi.

Results: A. pernix and P. abyssi proteins were assigned to COGs using the COGNITOR program; the results were verified on a case-by-case basis and augmented by additional database searches using the PSI-BLAST and TBLASTN programs. Functions were predicted for over 300 proteins from A. pernix, which could not be assigned a function using conventional methods with a conservative sequence similarity threshold, an approximately 50% increase compared to the original annotation. A. pernix shares most of the conserved core of proteins that were previously identified in the Euryarchaeota. Cluster analysis or distance matrix tree construction based on the co-occurrence of genomes in COGs showed that A. pernix forms a distinct group within the archaea, although grouping with the two species of Pyrococci, indicative of similar repertoires of conserved genes, was observed. No indication of a specific relationship between Crenarchaeota and eukaryotes was obtained in these analyses. Several proteins that are conserved in Euryarchaeota and most bacteria are unexpectedly missing in A. pernix, including the entire set of de novo purine biosynthesis enzymes, the GTPase FtsZ (a key component of the bacterial and euryarchaeal cell-division machinery), and the tRNA-specific pseudouridine synthase, previously considered universal. A. pernix is represented in 48 COGs that do not contain any euryarchaeal members. Many of these proteins are TCA cycle and electron transport chain enzymes, reflecting the aerobic lifestyle of A. pernix.

Conclusions: Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and predicted protein functions provide for a significant improvement in genome annotation. A differential genome display approach helps in a systematic investigation of common and distinct features of gene repertoires and in some cases reveals unexpected connections that may be indicative of functional similarities between phylogenetically distant organisms and of lateral gene exchange.

Citing Articles

Discovery of : A Novel Fungus Identified Using Genome Sequencing and Metabolomic Analysis.

Rana S, Singh S J Fungi (Basel). 2024; 10(11).

PMID: 39590710 PMC: 11596026. DOI: 10.3390/jof10110791.


Insights into the genomic architecture of a newly discovered endophytic species belonging to the complex from India.

Rana S, Singh S Front Microbiol. 2023; 14:1266620.

PMID: 38088969 PMC: 10712836. DOI: 10.3389/fmicb.2023.1266620.


Diversity, structure, and distribution of bacterioplankton and diazotroph communities in the Bay of Bengal during the winter monsoon.

Wu C, Narale D, Cui Z, Wang X, Liu H, Xu W Front Microbiol. 2022; 13:987462.

PMID: 36532434 PMC: 9748438. DOI: 10.3389/fmicb.2022.987462.


An integrated metabolome and transcriptome approach reveals the fruit flavor and regulatory network during jujube fruit development.

Lu D, Zhang L, Wu Y, Pan Q, Zhang Y, Liu P Front Plant Sci. 2022; 13:952698.

PMID: 36212371 PMC: 9537746. DOI: 10.3389/fpls.2022.952698.


Genome Features and AntiSMASH Analysis of an Endophytic Strain sp. R1.

Liu Y, Xu M, Tang Y, Shao Y, Wang H, Zhang H Metabolites. 2022; 12(6).

PMID: 35736454 PMC: 9229708. DOI: 10.3390/metabo12060521.


References
1.
Tekaia F, Dujon B . Pervasiveness of gene conservation and persistence of duplicates in cellular genomes. J Mol Evol. 1999; 49(5):591-600. DOI: 10.1007/pl00006580. View

2.
Sandigursky M, Franklin W . Uracil-DNA glycosylase in the extreme thermophile Archaeoglobus fulgidus. J Biol Chem. 2000; 275(25):19146-9. DOI: 10.1074/jbc.M001995200. View

3.
Koonin E . Pseudouridine synthases: four families of enzymes containing a putative uridine-binding motif also conserved in dUTPases and dCTP deaminases. Nucleic Acids Res. 1996; 24(12):2411-5. PMC: 145960. DOI: 10.1093/nar/24.12.2411. View

4.
Galperin M, Koonin E . Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption. In Silico Biol. 2001; 1(1):55-67. View

5.
Tatusov R, Koonin E, Lipman D . A genomic perspective on protein families. Science. 1997; 278(5338):631-7. DOI: 10.1126/science.278.5338.631. View