» Articles » PMID: 37970066

Scalable and Versatile Container-based Pipelines for De Novo Genome Assembly and Bacterial Annotation

Overview
Journal F1000Res
Date 2023 Nov 16
PMID 37970066
Authors
Affiliations
Soon will be listed here.
Abstract

Advancements in DNA sequencing technology have transformed the field of bacterial genomics, allowing for faster and more cost effective chromosome level assemblies compared to a decade ago. However, transforming raw reads into a complete genome model is a significant computational challenge due to the varying quality and quantity of data obtained from different sequencing instruments, as well as intrinsic characteristics of the genome and desired analyses. To address this issue, we have developed a set of container-based pipelines using Nextflow, offering both common workflows for inexperienced users and high levels of customization for experienced ones. Their processing strategies are adaptable based on the sequencing data type, and their modularity enables the incorporation of new components to address the community's evolving needs. These pipelines consist of three parts: quality control, de novo genome assembly, and bacterial genome annotation. In particular, the genome annotation pipeline provides a comprehensive overview of the genome, including standard gene prediction and functional inference, as well as predictions relevant to clinical applications such as virulence and resistance gene annotation, secondary metabolite detection, prophage and plasmid prediction, and more. The annotation results are presented in reports, genome browsers, and a web-based application that enables users to explore and interact with the genome annotation results. Overall, our user-friendly pipelines offer a seamless integration of computational tools to facilitate routine bacterial genomics research. The effectiveness of these is illustrated by examining the sequencing data of a clinical sample of Klebsiella pneumoniae.

Citing Articles

The GEA pipeline for characterizing Escherichia coli and Salmonella genomes.

Dickey A, Schmidt J, Bono J, Guragain M Sci Rep. 2024; 14(1):13257.

PMID: 38858528 PMC: 11164923. DOI: 10.1038/s41598-024-63832-z.

References
1.
Didelot X, Parkhill J . A scalable analytical approach from bacterial genomes to epidemiology. Philos Trans R Soc Lond B Biol Sci. 2022; 377(1861):20210246. PMC: 9393561. DOI: 10.1098/rstb.2021.0246. View

2.
de Campos T, de Almeida F, de Almeida A, Nakamura-Silva R, Oliveira-Silva M, de Sousa I . Multidrug-Resistant (MDR) Strains Isolated in a Brazilian Hospital Belong to New Clones. Front Microbiol. 2021; 12:604031. PMC: 8085564. DOI: 10.3389/fmicb.2021.604031. View

3.
Schwengers O, Hoek A, Fritzenwanker M, Falgenhauer L, Hain T, Chakraborty T . ASA3P: An automatic and scalable pipeline for the assembly, annotation and higher-level analysis of closely related bacterial isolates. PLoS Comput Biol. 2020; 16(3):e1007134. PMC: 7077848. DOI: 10.1371/journal.pcbi.1007134. View

4.
Akhter S, Aziz R, Edwards R . PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012; 40(16):e126. PMC: 3439882. DOI: 10.1093/nar/gks406. View

5.
Djaffardjy M, Marchment G, Sebe C, Blanchet R, Bellajhame K, Gaignard A . Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems. Comput Struct Biotechnol J. 2023; 21:2075-2085. PMC: 10030817. DOI: 10.1016/j.csbj.2023.03.003. View