» Articles » PMID: 37036103

The GEN-ERA Toolbox: Unified and Reproducible Workflows for Research in Microbial Genomics

Abstract

Background: Microbial culture collections play a key role in taxonomy by studying the diversity of their strains and providing well-characterized biological material to the scientific community for fundamental and applied research. These microbial resource centers thus need to implement new standards in species delineation, including whole-genome sequencing and phylogenomics. In this context, the genomic needs of the Belgian Coordinated Collections of Microorganisms were studied, resulting in the GEN-ERA toolbox. The latter is a unified cluster of bioinformatic workflows dedicated to both bacteria and small eukaryotes (e.g., yeasts).

Findings: This public toolbox allows researchers without a specific training in bioinformatics to perform robust phylogenomic analyses. Hence, it facilitates all steps from genome downloading and quality assessment, including genomic contamination estimation, to tree reconstruction. It also offers workflows for average nucleotide identity comparisons and metabolic modeling.

Technical Details: Nextflow workflows are launched by a single command and are available on the GEN-ERA GitHub repository (https://github.com/Lcornet/GENERA). All the workflows are based on Singularity containers to increase reproducibility.

Testing: The toolbox was developed for a diversity of microorganisms, including bacteria and fungi. It was further tested on an empirical dataset of 18 (meta)genomes of early branching Cyanobacteria, providing the most up-to-date phylogenomic analysis of the Gloeobacterales order, the first group to diverge in the evolutionary tree of Cyanobacteria.

Conclusion: The GEN-ERA toolbox can be used to infer completely reproducible comparative genomic and metabolic analyses on prokaryotes and small eukaryotes. Although designed for routine bioinformatics of culture collections, it can also be used by all researchers interested in microbial taxonomy, as exemplified by our case study on Gloeobacterales.

Citing Articles

Metagenome quality metrics and taxonomical annotation visualization through the integration of MAGFlow and BIgMAG.

Yepes-Garcia J, Falquet L F1000Res. 2024; 13:640.

PMID: 39360247 PMC: 11445639. DOI: 10.12688/f1000research.152290.2.


TADA: taxonomy-aware dataset aggregator.

Hagglund E, Andersson S, Guy L Bioinformatics. 2023; 39(12).

PMID: 38060257 PMC: 10733731. DOI: 10.1093/bioinformatics/btad742.


The GEN-ERA toolbox: unified and reproducible workflows for research in microbial genomics.

Cornet L, Durieu B, Baert F, Dhooge E, Colignon D, Meunier L Gigascience. 2023; 12.

PMID: 37036103 PMC: 10084500. DOI: 10.1093/gigascience/giad022.

References
1.
Simion P, Philippe H, Baurain D, Jager M, Richter D, Di Franco A . A Large and Consistent Phylogenomic Dataset Supports Sponges as the Sister Group to All Other Animals. Curr Biol. 2017; 27(7):958-967. DOI: 10.1016/j.cub.2017.02.031. View

2.
Chaumeil P, Mussig A, Hugenholtz P, Parks D . GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics. 2022; 38(23):5315-5316. PMC: 9710552. DOI: 10.1093/bioinformatics/btac672. View

3.
Koren S, Walenz B, Berlin K, Miller J, Bergman N, Phillippy A . Canu: scalable and accurate long-read assembly via adaptive -mer weighting and repeat separation. Genome Res. 2017; 27(5):722-736. PMC: 5411767. DOI: 10.1101/gr.215087.116. View

4.
Nurk S, Meleshko D, Korobeynikov A, Pevzner P . metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017; 27(5):824-834. PMC: 5411777. DOI: 10.1101/gr.213959.116. View

5.
Bruna T, Hoff K, Lomsadze A, Stanke M, Borodovsky M . BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform. 2021; 3(1):lqaa108. PMC: 7787252. DOI: 10.1093/nargab/lqaa108. View