» Articles » PMID: 34396097

Gotree/Goalign: Toolkit and Go API to Facilitate the Development of Phylogenetic Workflows

Overview
Specialty Biology
Date 2021 Aug 16
PMID 34396097
Citations 42
Authors
Affiliations
Soon will be listed here.
Abstract

Phylogenetics is nowadays at the center of numerous studies in many fields, ranging from comparative genomics to molecular epidemiology. However, phylogenetic analysis workflows are usually complex and difficult to implement, as they are often composed of many small, reccuring, but important data manipulations steps. Among these, we can find file reformatting, sequence renaming, tree re-rooting, tree comparison, bootstrap support computation, etc. These are often performed by custom scripts or by several heterogeneous tools, which may be error prone, uneasy to maintain and produce results that are challenging to reproduce. For all these reasons, the development and reuse of phylogenetic workflows is often a complex task. We identified many operations that are part of most phylogenetic analyses, and implemented them in a toolkit called Gotree/Goalign. The Gotree/Goalign toolkit implements more than 120 user-friendly commands and an API dedicated to multiple sequence alignment and phylogenetic tree manipulations. It is developed in Go, which makes executables easily installable, integrable in workflow environments, and parallelizable when possible. Moreover, Go is a compiled language, which accelerates computations compared to interpreted languages. This toolkit is freely available on most platforms (Linux, MacOS and Windows) and most architectures (amd64, i386) on GitHub at https://github.com/evolbioinfo/gotree, Bioconda and DockerHub.

Citing Articles

Convergent evolution of oxidized sugar metabolism in commensal and pathogenic microbes in the inflamed gut.

Levy S, Jiang A, Grant M, Arp G, Minabou Ndjite G, Jiang X Nat Commun. 2025; 16(1):1121.

PMID: 39875389 PMC: 11775122. DOI: 10.1038/s41467-025-56332-9.


Genome sequences of four Ixodes species expands understanding of tick evolution.

Cerqueira de Araujo A, Noel B, Bretaudeau A, Labadie K, Boudet M, Tadrent N BMC Biol. 2025; 23(1):17.

PMID: 39838418 PMC: 11752866. DOI: 10.1186/s12915-025-02121-1.


Viral niche-partitioning: comparative genomics of giant viruses across environmental gradients in a high Arctic freshwater-saltwater lake.

Pitot T, Girard C, Rapp J, Somerville V, Culley A, Vincent W ISME Commun. 2025; 5(1):ycae155.

PMID: 39834781 PMC: 11745019. DOI: 10.1093/ismeco/ycae155.


Exploring SNP filtering strategies: the influence of strict vs soft core.

Taouk M, Featherstone L, Taiaroa G, Seemann T, Ingle D, Stinear T Microb Genom. 2025; 11(1.

PMID: 39812553 PMC: 11734701. DOI: 10.1099/mgen.0.001346.


Whole-genome automated assembly pipeline for strains from reference, and clinical samples using the integrated CtGAP pipeline.

Olagoke O, Aziz A, Zhu L, Read T, Dean D NAR Genom Bioinform. 2025; 7(1):lqae187.

PMID: 39781511 PMC: 11704784. DOI: 10.1093/nargab/lqae187.


References
1.
Kriventseva E, Kuznetsov D, Tegenfeldt F, Manni M, Dias R, Simao F . OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 2018; 47(D1):D807-D811. PMC: 6323947. DOI: 10.1093/nar/gky1053. View

2.
Modi V, Dunbrack Jr R . A Structurally-Validated Multiple Sequence Alignment of 497 Human Protein Kinase Domains. Sci Rep. 2019; 9(1):19790. PMC: 6930252. DOI: 10.1038/s41598-019-56499-4. View

3.
Dalai S, Junqueira D, Wilkinson E, Mehra R, Kosakovsky Pond S, Levy V . Combining Phylogenetic and Network Approaches to Identify HIV-1 Transmission Links in San Mateo County, California. Front Microbiol. 2018; 9:2799. PMC: 6292275. DOI: 10.3389/fmicb.2018.02799. View

4.
Vanderpool D, Minh B, Lanfear R, Hughes D, Murali S, Harris R . Primate phylogenomics uncovers multiple rapid radiations and ancient interspecific introgression. PLoS Biol. 2020; 18(12):e3000954. PMC: 7738166. DOI: 10.1371/journal.pbio.3000954. View

5.
Katoh K, Standley D . MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013; 30(4):772-80. PMC: 3603318. DOI: 10.1093/molbev/mst010. View