» Articles » PMID: 20333228

Distinguishing Microbial Genome Fragments Based on Their Composition: Evolutionary and Comparative Genomic Perspectives

Overview
Date 2010 Mar 25
PMID 20333228
Citations 12
Authors
Affiliations
Soon will be listed here.
Abstract

It is well known that patterns of nucleotide composition vary within and among genomes, although the reasons why these variations exist are not completely understood. Between-genome compositional variation has been exploited to assign environmental shotgun sequences to their most likely originating genomes, whereas within-genome variation has been used to identify recently acquired genetic material such as pathogenicity islands. Recent sequence assignment techniques have achieved high levels of accuracy on artificial data sets, but the relative difficulty of distinguishing lineages with varying degrees of relatedness, and different types of genomic sequence, has not been examined in depth. We investigated the compositional differences in a set of 774 sequenced microbial genomes, finding rapid divergence among closely related genomes, but also convergence of compositional patterns among genomes with similar habitats. Support vector machines were then used to distinguish all pairs of genomes based on genome fragments 500 nucleotides in length. The nearly 300,000 accuracy scores obtained from these trials were used to construct general models of distinguishability versus taxonomic and compositional indices of genomic divergence. Unusual genome pairs were evident from their large residuals relative to the fitted model, and we identified several factors including genome reduction, putative lateral genetic transfer, and habitat convergence that influence the distinguishability of genomes. The positional, compositional, and functional context of a fragment within a genome has a strong influence on its likelihood of correct classification, but in a way that depends on the taxonomic and ecological similarity of the comparator genome.

Citing Articles

The GC% landscape of the Nucleocytoviricota.

Witt A, Carvalho J, Serafim M, Arias N, Rodrigues R, Abrahao J Braz J Microbiol. 2024; 55(4):3373-3387.

PMID: 39180708 PMC: 11711839. DOI: 10.1007/s42770-024-01496-7.


kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species.

Mouratidis I, Baltoumas F, Chantzi N, Patsakis M, Chan C, Montgomery A Comput Struct Biotechnol J. 2024; 23:1919-1928.

PMID: 38711760 PMC: 11070822. DOI: 10.1016/j.csbj.2024.04.050.


Correcting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks.

Kohls M, Kircher M, Krepel J, Liebig P, Jung K Genes (Basel). 2021; 12(11).

PMID: 34828361 PMC: 8624964. DOI: 10.3390/genes12111755.


Development of self-compressing BLSOM for comprehensive analysis of big sequence data.

Kikuchi A, Ikemura T, Abe T Biomed Res Int. 2015; 2015:506052.

PMID: 26495297 PMC: 4606171. DOI: 10.1155/2015/506052.


A Markovian analysis of bacterial genome sequence constraints.

Skewes A, Welch R PeerJ. 2013; 1:e127.

PMID: 24010012 PMC: 3757466. DOI: 10.7717/peerj.127.


References
1.
Monteiro-Vitorello C, de Oliveira M, Zerillo M, Varani A, Civerolo E, Van Sluys M . Xylella and Xanthomonas Mobil'omics. OMICS. 2005; 9(2):146-59. DOI: 10.1089/omi.2005.9.146. View

2.
Willenbrock H, Friis C, Juncker A, Ussery D . An environmental signature for 323 microbial genomes based on codon adaptation indices. Genome Biol. 2006; 7(12):R114. PMC: 1794427. DOI: 10.1186/gb-2006-7-12-r114. View

3.
Norton J, Klotz M, Stein L, Arp D, Bottomley P, Chain P . Complete genome sequence of Nitrosospira multiformis, an ammonia-oxidizing bacterium from the soil environment. Appl Environ Microbiol. 2008; 74(11):3559-72. PMC: 2423025. DOI: 10.1128/AEM.02722-07. View

4.
Tettelin H, Masignani V, Cieslewicz M, Donati C, Medini D, Ward N . Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". Proc Natl Acad Sci U S A. 2005; 102(39):13950-5. PMC: 1216834. DOI: 10.1073/pnas.0506758102. View

5.
Chan C, Hsu A, Halgamuge S, Tang S . Binning sequences using very sparse labels within a metagenome. BMC Bioinformatics. 2008; 9:215. PMC: 2383919. DOI: 10.1186/1471-2105-9-215. View