» Articles » PMID: 37753177

Integrating Chromatin Conformation Information in a Self-supervised Learning Model Improves Metagenome Binning

Overview
Journal PeerJ
Date 2023 Sep 27
PMID 37753177
Authors
Affiliations
Soon will be listed here.
Abstract

Metagenome binning is a key step, downstream of metagenome assembly, to group scaffolds by their genome of origin. Although accurate binning has been achieved on datasets containing multiple samples from the same community, the completeness of binning is often low in datasets with a small number of samples due to a lack of robust species co-abundance information. In this study, we exploited the chromatin conformation information obtained from Hi-C sequencing and developed a new reference-independent algorithm, Metagenome Binning with Abundance and Tetra-nucleotide frequencies-Long Range (metaBAT-LR), to improve the binning completeness of these datasets. This self-supervised algorithm builds a model from a set of high-quality genome bins to predict scaffold pairs that are likely to be derived from the same genome. Then, it applies these predictions to merge incomplete genome bins, as well as recruit unbinned scaffolds. We validated metaBAT-LR's ability to bin-merge and recruit scaffolds on both synthetic and real-world metagenome datasets of varying complexity. Benchmarking against similar software tools suggests that metaBAT-LR uncovers unique bins that were missed by all other methods. MetaBAT-LR is open-source and is available at https://bitbucket.org/project-metabat/metabat-lr.

References
1.
Ivanova V, Chernevskaya E, Vasiluev P, Ivanov A, Tolstoganov I, Shafranskaya D . Hi-C Metagenomics in the ICU: Exploring Clinically Relevant Features of Gut Microbiome in Chronically Critically Ill Patients. Front Microbiol. 2022; 12:770323. PMC: 8851603. DOI: 10.3389/fmicb.2021.770323. View

2.
DeMaere M, Darling A . bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biol. 2019; 20(1):46. PMC: 6391755. DOI: 10.1186/s13059-019-1643-1. View

3.
Yaffe E, Tanay A . Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat Genet. 2011; 43(11):1059-65. DOI: 10.1038/ng.947. View

4.
Nayfach S, Roux S, Seshadri R, Udwary D, Varghese N, Schulz F . A genomic catalog of Earth's microbiomes. Nat Biotechnol. 2020; 39(4):499-509. PMC: 8041624. DOI: 10.1038/s41587-020-0718-6. View

5.
Olm M, Brown C, Brooks B, Banfield J . dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 2017; 11(12):2864-2868. PMC: 5702732. DOI: 10.1038/ismej.2017.126. View