» Articles » PMID: 30873536

Phylostratr: a Framework for Phylostratigraphy

Overview
Journal Bioinformatics
Specialty Biology
Date 2019 Mar 16
PMID 30873536
Citations 20
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: The goal of phylostratigraphy is to infer the evolutionary origin of each gene in an organism. This is done by searching for homologs within increasingly broad clades. The deepest clade that contains a homolog of the protein(s) encoded by a gene is that gene's phylostratum.

Results: We have created a general R-based framework, phylostratr, to estimate the phylostratum of every gene in a species. The program fully automates analysis: selecting species for balanced representation, retrieving sequences, building databases, inferring phylostrata and returning diagnostics. Key diagnostics include: detection of genes with inferred homologs in old clades, but not intermediate ones; proteome quality assessments; false-positive diagnostics, and checks for missing organellar genomes. phylostratr allows extensive customization and systematic comparisons of the influence of analysis parameters or genomes on phylostrata inference. A user may: modify the automatically generated clade tree or use their own tree; provide custom sequences in place of those automatically retrieved from UniProt; replace BLAST with an alternative algorithm; or tailor the method and sensitivity of the homology inference classifier. We show the utility of phylostratr through case studies in Arabidopsis thaliana and Saccharomyces cerevisiae.

Availability And Implementation: Source code available at https://github.com/arendsee/phylostratr.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Expression of Random Sequences and de novo Evolved Genes From the Mouse in Human Cells Reveals Functional Diversity and Specificity.

Aldrovandi S, Fajardo Castro J, Ullrich K, Karger A, Luria V, Tautz D Genome Biol Evol. 2024; 16(12).

PMID: 39663928 PMC: 11635099. DOI: 10.1093/gbe/evae175.


De Novo Emerged Gene Search in Eukaryotes with DENSE.

Roginski P, Grandchamp A, Quignot C, Lopes A Genome Biol Evol. 2024; 16(8).

PMID: 39212967 PMC: 11363675. DOI: 10.1093/gbe/evae159.


Chromosome-scale genome assembly and annotation of the tetraploid potato cultivar Diacol Capiro adapted to the Andean region.

Reyes-Herrera P, Delgadillo-Duran D, Flores-Gonzalez M, Mueller L, Cristancho M, Barrero L G3 (Bethesda). 2024; 14(9).

PMID: 39058924 PMC: 11537804. DOI: 10.1093/g3journal/jkae139.


Functional annotation and meta-analysis of maize transcriptomes reveal genes involved in biotic and abiotic stress.

Hayford R, Haley O, Cannon E, Portwood 2nd J, Gardiner J, Andorf C BMC Genomics. 2024; 25(1):533.

PMID: 38816789 PMC: 11137889. DOI: 10.1186/s12864-024-10443-7.


Maize Feature Store: A centralized resource to manage and analyze curated maize multi-omics features for machine learning applications.

Sen S, Woodhouse M, Portwood 2nd J, Andorf C Database (Oxford). 2023; 2023.

PMID: 37935586 PMC: 10634621. DOI: 10.1093/database/baad078.