» Articles » PMID: 28154557

Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF

Overview
Journal Front Microbiol
Specialty Microbiology
Date 2017 Feb 4
PMID 28154557
Citations 7
Authors
Affiliations
Soon will be listed here.
Abstract

Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length ) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of . Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of .

Citing Articles

Genetic Transfer in Action: Uncovering DNA Flow in an Extremophilic Microbial Community.

Van Etten J, Stephens T, Bhattacharya D Environ Microbiol. 2025; 27(2):e70048.

PMID: 39900484 PMC: 11790422. DOI: 10.1111/1462-2920.70048.


Utilization of a natural language processing-based approach to determine the composition of artifact residues.

Nguyen T, Brownstein K BMC Bioinformatics. 2024; 25(1):311.

PMID: 39333884 PMC: 11437931. DOI: 10.1186/s12859-024-05888-2.


Alignment-Free Sequence Analysis and Applications.

Ren J, Bai X, Lu Y, Tang K, Wang Y, Reinert G Annu Rev Biomed Data Sci. 2019; 1:93-114.

PMID: 31828235 PMC: 6905628. DOI: 10.1146/annurev-biodatasci-080917-013431.


-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank.

Bernard G, Greenfield P, Ragan M, Chan C mSystems. 2018; 3(6).

PMID: 30505941 PMC: 6247013. DOI: 10.1128/mSystems.00257-18.


Background Adjusted Alignment-Free Dissimilarity Measures Improve the Detection of Horizontal Gene Transfer.

Tang K, Lu Y, Sun F Front Microbiol. 2018; 9:711.

PMID: 29713314 PMC: 5911508. DOI: 10.3389/fmicb.2018.00711.


References
1.
Bernard G, Ragan M, Chan C . Recapitulating phylogenies using -mers: from trees to networks. F1000Res. 2017; 5:2789. PMC: 5224691. DOI: 10.12688/f1000research.10225.2. View

2.
Hagan R, Langston M, Wang K . Lower Bounds on Paraclique Density. Discrete Appl Math. 2016; 204:208-212. PMC: 4820293. DOI: 10.1016/j.dam.2015.11.010. View

3.
Doolittle W . The practice of classification and the theory of evolution, and what the demise of Charles Darwin's tree of life hypothesis means for both of them. Philos Trans R Soc Lond B Biol Sci. 2009; 364(1527):2221-8. PMC: 2873000. DOI: 10.1098/rstb.2009.0032. View

4.
Jain R, Rivera M, Moore J, Lake J . Horizontal gene transfer accelerates genome innovation and evolution. Mol Biol Evol. 2003; 20(10):1598-602. DOI: 10.1093/molbev/msg154. View

5.
Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J . Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000; 25(1):25-9. PMC: 3037419. DOI: 10.1038/75556. View