Topiary: Pruning the Manual Labor from Ancestral Sequence Reconstruction
Overview
Affiliations
Ancestral sequence reconstruction (ASR) is a powerful tool to study the evolution of proteins and thus gain deep insight into the relationships among protein sequence, structure, and function. A major barrier to its broad use is the complexity of the task: it requires multiple software packages, complex file manipulations, and expert phylogenetic knowledge. Here we introduce topiary, a software pipeline that aims to overcome this barrier. To use topiary, users prepare a spreadsheet with a handful of sequences. Topiary then: (1) Infers the taxonomic scope for the ASR study and finds relevant sequences by BLAST; (2) Does taxonomically informed sequence quality control and redundancy reduction; (3) Constructs a multiple sequence alignment; (4) Generates a maximum-likelihood gene tree; (5) Reconciles the gene tree to the species tree; (6) Reconstructs ancestral amino acid sequences; and (7) Determines branch supports. The pipeline returns annotated evolutionary trees, spreadsheets with sequences, and graphical summaries of ancestor quality. This is achieved by integrating modern phylogenetics software (Muscle5, RAxML-NG, GeneRax, and PastML) with online databases (NCBI and the Open Tree of Life). In this paper, we introduce non-expert readers to the steps required for ASR, describe the specific design choices made in topiary, provide a detailed protocol for users, and then validate the pipeline using datasets from a broad collection of protein families. Topiary is freely available for download: https://github.com/harmslab/topiary.
Ancestral Reconstruction and the Evolution of Protein Energy Landscapes.
Chisholm L, Orlandi K, Phillips S, Shavlik M, Harms M Annu Rev Biophys. 2023; 53(1):127-146.
PMID: 38134334 PMC: 11192866. DOI: 10.1146/annurev-biophys-030722-125440.
Evolutionary analysis reveals the origin of sodium coupling in glutamate transporters.
Reddy K, Rasool B, Akher F, Kutlesic N, Pant S, Boudker O bioRxiv. 2023; .
PMID: 38106174 PMC: 10723334. DOI: 10.1101/2023.12.03.569786.
Keijser J, Sprekeler H Front Neural Circuits. 2023; 17:1172464.
PMID: 37215503 PMC: 10192557. DOI: 10.3389/fncir.2023.1172464.
Topiary: Pruning the manual labor from ancestral sequence reconstruction.
Orlandi K, Phillips S, Sailer Z, Harman J, Harms M Protein Sci. 2022; 32(2):e4551.
PMID: 36565302 PMC: 9847077. DOI: 10.1002/pro.4551.
Evolution avoids a pathological stabilizing interaction in the immune protein S100A9.
Harman J, Reardon P, Costello S, Warren G, Phillips S, Connor P Proc Natl Acad Sci U S A. 2022; 119(41):e2208029119.
PMID: 36194634 PMC: 9565474. DOI: 10.1073/pnas.2208029119.