» Articles » PMID: 38965579

VirRep: a Hybrid Language Representation Learning Framework for Identifying Viruses from Human Gut Metagenomes

Overview
Journal Genome Biol
Specialties Biology
Genetics
Date 2024 Jul 4
PMID 38965579
Authors
Affiliations
Soon will be listed here.
Abstract

Identifying viruses from metagenomes is a common step to explore the virus composition in the human gut. Here, we introduce VirRep, a hybrid language representation learning framework, for identifying viruses from human gut metagenomes. VirRep combines a context-aware encoder and an evolution-aware encoder to improve sequence representation by incorporating k-mer patterns and sequence homologies. Benchmarking on both simulated and real datasets with varying viral proportions demonstrates that VirRep outperforms state-of-the-art methods. When applied to fecal metagenomes from a colorectal cancer cohort, VirRep identifies 39 high-quality viral species associated with the disease, many of which cannot be detected by existing methods.

Citing Articles

Complementary insights into gut viral genomes: a comparative benchmark of short- and long-read metagenomes using diverse assemblers and binners.

Wang H, Sun C, Li Y, Chen J, Zhao X, Chen W Microbiome. 2024; 12(1):260.

PMID: 39707560 PMC: 11660840. DOI: 10.1186/s40168-024-01981-z.


ViraLM: empowering virus discovery through the genome foundation model.

Peng C, Shang J, Guan J, Wang D, Sun Y Bioinformatics. 2024; 40(12).

PMID: 39579086 PMC: 11631183. DOI: 10.1093/bioinformatics/btae704.


VirRep: a hybrid language representation learning framework for identifying viruses from human gut metagenomes.

Dong Y, Chen W, Zhao X Genome Biol. 2024; 25(1):177.

PMID: 38965579 PMC: 11229495. DOI: 10.1186/s13059-024-03320-9.

References
1.
Li W, Godzik A . Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006; 22(13):1658-9. DOI: 10.1093/bioinformatics/btl158. View

2.
Wirbel J, Pyl P, Kartal E, Zych K, Kashani A, Milanese A . Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer. Nat Med. 2019; 25(4):679-689. PMC: 7984229. DOI: 10.1038/s41591-019-0406-6. View

3.
Nayfach S, Camargo A, Schulz F, Eloe-Fadrosh E, Roux S, Kyrpides N . CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat Biotechnol. 2020; 39(5):578-585. PMC: 8116208. DOI: 10.1038/s41587-020-00774-7. View

4.
Nayfach S, Roux S, Seshadri R, Udwary D, Varghese N, Schulz F . A genomic catalog of Earth's microbiomes. Nat Biotechnol. 2020; 39(4):499-509. PMC: 8041624. DOI: 10.1038/s41587-020-0718-6. View

5.
Mirdita M, Steinegger M, Breitwieser F, Soding J, Levy Karin E . Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics. 2021; 37(18):3029-3031. PMC: 8479651. DOI: 10.1093/bioinformatics/btab184. View