VirRep: a Hybrid Language Representation Learning Framework for Identifying Viruses from Human Gut Metagenomes
Overview
Authors
Affiliations
Identifying viruses from metagenomes is a common step to explore the virus composition in the human gut. Here, we introduce VirRep, a hybrid language representation learning framework, for identifying viruses from human gut metagenomes. VirRep combines a context-aware encoder and an evolution-aware encoder to improve sequence representation by incorporating k-mer patterns and sequence homologies. Benchmarking on both simulated and real datasets with varying viral proportions demonstrates that VirRep outperforms state-of-the-art methods. When applied to fecal metagenomes from a colorectal cancer cohort, VirRep identifies 39 high-quality viral species associated with the disease, many of which cannot be detected by existing methods.
Wang H, Sun C, Li Y, Chen J, Zhao X, Chen W Microbiome. 2024; 12(1):260.
PMID: 39707560 PMC: 11660840. DOI: 10.1186/s40168-024-01981-z.
ViraLM: empowering virus discovery through the genome foundation model.
Peng C, Shang J, Guan J, Wang D, Sun Y Bioinformatics. 2024; 40(12).
PMID: 39579086 PMC: 11631183. DOI: 10.1093/bioinformatics/btae704.
Dong Y, Chen W, Zhao X Genome Biol. 2024; 25(1):177.
PMID: 38965579 PMC: 11229495. DOI: 10.1186/s13059-024-03320-9.