» Articles » PMID: 36069454

MetaPhage: an Automated Pipeline for Analyzing, Annotating, and Classifying Bacteriophages in Metagenomics Sequencing Data

Overview
Journal mSystems
Specialty Microbiology
Date 2022 Sep 7
PMID 36069454
Authors
Affiliations
Soon will be listed here.
Abstract

Phages are the most abundant biological entities on the planet, and they play an important role in controlling density, diversity, and network interactions among bacterial communities through predation and gene transfer. To date, a variety of bacteriophage identification tools have been developed that differ in the phage mining strategies used, input files requested, and results produced. However, new users attempting bacteriophage analysis can struggle to select the best methods and interpret the variety of results produced. Here, we present MetaPhage, a comprehensive reads-to-report pipeline that streamlines the use of multiple phage miners and generates an exhaustive report. The report both summarizes and visualizes the key findings and enables further exploration of key results via interactive filterable tables. The pipeline is implemented in Nextflow, a widely adopted workflow manager that enables an optimized parallelization of tasks in different locations, from local server to the cloud; this ensures reproducible results from containerized packages. MetaPhage is designed to enable scalability and reproducibility; also, it can be easily expanded to include new miners and methods as they are developed in this continuously growing field. MetaPhage is freely available under a GPL-3.0 license at https://github.com/MattiaPandolfoVR/MetaPhage. Bacteriophages (viruses that infect bacteria) are the most abundant biological entities on earth and are increasingly studied as members of the resident microbiota community in many environments, from oceans to soils and the human gut. Their identification is of great importance to better understand complex bacterial dynamics and microbial ecosystem function. A variety of metagenome bacteriophage identification tools have been developed that differ in the phage mining strategies used, input files requested, and results produced. To facilitate the management and the execution of such a complex workflow, we developed MetaPhage (MP), a comprehensive reads-to-report pipeline that streamlines the use of multiple phage miners and generates an exhaustive report. The pipeline is implemented in Nextflow, a widely adopted workflow manager that enables an optimized parallelization of tasks. MetaPhage is designed to enable scalability and reproducibility and offers an installation-free, dependency-free, and conflict-free workflow execution.

Citing Articles

Metagenomic investigation of viruses in green sea turtles ().

Li H, Chen Y, Xia Z, Zhuang D, Cong F, Lian Y Front Microbiol. 2025; 16:1492038.

PMID: 39911250 PMC: 11794262. DOI: 10.3389/fmicb.2025.1492038.


Cave Pools in Carlsbad Caverns National Park Contain Diverse Bacteriophage Communities and Novel Viral Sequences.

Ulbrich J, Jobe N, Jones D, Kieft T Microb Ecol. 2024; 87(1):163.

PMID: 39724159 PMC: 11671562. DOI: 10.1007/s00248-024-02479-9.


Gastrointestinal jumbo phages possess independent synthesis and utilization systems of NAD.

Li C, Liu K, Gu C, Li M, Zhou P, Chen L Microbiome. 2024; 12(1):268.

PMID: 39707494 PMC: 11662467. DOI: 10.1186/s40168-024-01984-w.


Surface microlayer-mediated virome dissemination in the Central Arctic.

Rahlff J, Westmeijer G, Weissenbach J, Antson A, Holmfeldt K Microbiome. 2024; 12(1):218.

PMID: 39449105 PMC: 11515562. DOI: 10.1186/s40168-024-01902-0.


The phageome of patients with ulcerative colitis treated with donor fecal microbiota reveals markers associated with disease remission.

Majzoub M, Paramsothy S, Haifer C, Parthasarathy R, Borody T, Leong R Nat Commun. 2024; 15(1):8979.

PMID: 39420033 PMC: 11487140. DOI: 10.1038/s41467-024-53454-4.


References
1.
Fu L, Niu B, Zhu Z, Wu S, Li W . CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012; 28(23):3150-2. PMC: 3516142. DOI: 10.1093/bioinformatics/bts565. View

2.
Telatin A, Fariselli P, Birolo G . SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files. Bioengineering (Basel). 2021; 8(5). PMC: 8148589. DOI: 10.3390/bioengineering8050059. View

3.
Kieft K, Zhou Z, Anantharaman K . VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome. 2020; 8(1):90. PMC: 7288430. DOI: 10.1186/s40168-020-00867-0. View

4.
McMurdie P, Holmes S . phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One. 2013; 8(4):e61217. PMC: 3632530. DOI: 10.1371/journal.pone.0061217. View

5.
Nayfach S, Camargo A, Schulz F, Eloe-Fadrosh E, Roux S, Kyrpides N . CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat Biotechnol. 2020; 39(5):578-585. PMC: 8116208. DOI: 10.1038/s41587-020-00774-7. View