» Articles » PMID: 36565302

Topiary: Pruning the Manual Labor from Ancestral Sequence Reconstruction

Overview
Journal Protein Sci
Specialty Biochemistry
Date 2022 Dec 24
PMID 36565302
Authors
Affiliations
Soon will be listed here.
Abstract

Ancestral sequence reconstruction (ASR) is a powerful tool to study the evolution of proteins and thus gain deep insight into the relationships among protein sequence, structure, and function. A major barrier to its broad use is the complexity of the task: it requires multiple software packages, complex file manipulations, and expert phylogenetic knowledge. Here we introduce topiary, a software pipeline that aims to overcome this barrier. To use topiary, users prepare a spreadsheet with a handful of sequences. Topiary then: (1) Infers the taxonomic scope for the ASR study and finds relevant sequences by BLAST; (2) Does taxonomically informed sequence quality control and redundancy reduction; (3) Constructs a multiple sequence alignment; (4) Generates a maximum-likelihood gene tree; (5) Reconciles the gene tree to the species tree; (6) Reconstructs ancestral amino acid sequences; and (7) Determines branch supports. The pipeline returns annotated evolutionary trees, spreadsheets with sequences, and graphical summaries of ancestor quality. This is achieved by integrating modern phylogenetics software (Muscle5, RAxML-NG, GeneRax, and PastML) with online databases (NCBI and the Open Tree of Life). In this paper, we introduce non-expert readers to the steps required for ASR, describe the specific design choices made in topiary, provide a detailed protocol for users, and then validate the pipeline using datasets from a broad collection of protein families. Topiary is freely available for download: https://github.com/harmslab/topiary.

Citing Articles

Ancestral Reconstruction and the Evolution of Protein Energy Landscapes.

Chisholm L, Orlandi K, Phillips S, Shavlik M, Harms M Annu Rev Biophys. 2023; 53(1):127-146.

PMID: 38134334 PMC: 11192866. DOI: 10.1146/annurev-biophys-030722-125440.


Evolutionary analysis reveals the origin of sodium coupling in glutamate transporters.

Reddy K, Rasool B, Akher F, Kutlesic N, Pant S, Boudker O bioRxiv. 2023; .

PMID: 38106174 PMC: 10723334. DOI: 10.1101/2023.12.03.569786.


Cortical interneurons: fit for function and fit to function? Evidence from development and evolution.

Keijser J, Sprekeler H Front Neural Circuits. 2023; 17:1172464.

PMID: 37215503 PMC: 10192557. DOI: 10.3389/fncir.2023.1172464.


Topiary: Pruning the manual labor from ancestral sequence reconstruction.

Orlandi K, Phillips S, Sailer Z, Harman J, Harms M Protein Sci. 2022; 32(2):e4551.

PMID: 36565302 PMC: 9847077. DOI: 10.1002/pro.4551.


Evolution avoids a pathological stabilizing interaction in the immune protein S100A9.

Harman J, Reardon P, Costello S, Warren G, Phillips S, Connor P Proc Natl Acad Sci U S A. 2022; 119(41):e2208029119.

PMID: 36194634 PMC: 9565474. DOI: 10.1073/pnas.2208029119.

References
1.
Fu L, Niu B, Zhu Z, Wu S, Li W . CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012; 28(23):3150-2. PMC: 3516142. DOI: 10.1093/bioinformatics/bts565. View

2.
Huerta-Cepas J, Serra F, Bork P . ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Mol Biol Evol. 2016; 33(6):1635-8. PMC: 4868116. DOI: 10.1093/molbev/msw046. View

3.
Flouri T, Izquierdo-Carrasco F, Darriba D, Aberer A, Nguyen L, Minh B . The phylogenetic likelihood library. Syst Biol. 2014; 64(2):356-62. PMC: 4380035. DOI: 10.1093/sysbio/syu084. View

4.
Nicoll C, Bailleul G, Fiorentini F, Mascotti M, Fraaije M, Mattevi A . Ancestral-sequence reconstruction unveils the structural basis of function in mammalian FMOs. Nat Struct Mol Biol. 2019; 27(1):14-24. DOI: 10.1038/s41594-019-0347-2. View

5.
Kozlov A, Darriba D, Flouri T, Morel B, Stamatakis A . RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics. 2019; 35(21):4453-4455. PMC: 6821337. DOI: 10.1093/bioinformatics/btz305. View