» Articles » PMID: 29732265

Extraction of Phenotypic Traits from Taxonomic Descriptions for the Tree of Life Using Natural Language Processing

Overview
Journal Appl Plant Sci
Date 2018 May 8
PMID 29732265
Citations 7
Authors
Affiliations
Soon will be listed here.
Abstract

Premise Of The Study: Phenotypic data sets are necessary to elucidate the genealogy of life, but assembling phenotypic data for taxa across the tree of life can be technically challenging and prohibitively time consuming. We describe a semi-automated protocol to facilitate and expedite the assembly of phenotypic character matrices of plants from formal taxonomic descriptions. This pipeline uses new natural language processing (NLP) techniques and a glossary of over 9000 botanical terms.

Methods And Results: Our protocol includes the Explorer of Taxon Concepts (ETC), an online application that assembles taxon-by-character matrices from taxonomic descriptions, and MatrixConverter, a Java application that enables users to evaluate and discretize the characters extracted by ETC. We demonstrate this protocol using descriptions from Araucariaceae.

Conclusions: The NLP pipeline unlocks the phenotypic data found in taxonomic descriptions and makes them usable for evolutionary analyses.

Citing Articles

FloraTraiter: Automated parsing of traits from descriptive biodiversity literature.

Folk R, Guralnick R, LaFrance R Appl Plant Sci. 2024; 12(1):e11563.

PMID: 38369975 PMC: 10873814. DOI: 10.1002/aps3.11563.


Diatoms.org: supporting taxonomists, connecting communities.

Spaulding S, Potapova M, Bishop I, Lee S, Gasperak T, Jovanoska E Diatom Res. 2022; 36(4):291-304.

PMID: 35958044 PMC: 9359083. DOI: 10.1080/0269249X.2021.2006790.


Inferring microbiota functions from taxonomic genes: a review.

Djemiel C, Maron P, Terrat S, Dequiedt S, Cottin A, Ranjard L Gigascience. 2022; 11(1).

PMID: 35022702 PMC: 8756179. DOI: 10.1093/gigascience/giab090.


Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait.

Singh G, Papoutsoglou E, Keijts-Lalleman F, Vencheva B, Rice M, Visser R BMC Plant Biol. 2021; 21(1):198.

PMID: 33894758 PMC: 8070292. DOI: 10.1186/s12870-021-02943-5.


Biodiversity data integration-the significance of data resolution and domain.

Konig C, Weigelt P, Schrader J, Taylor A, Kattge J, Kreft H PLoS Biol. 2019; 17(3):e3000183.

PMID: 30883539 PMC: 6445469. DOI: 10.1371/journal.pbio.3000183.


References
1.
Rahaman M, Chen D, Gillani Z, Klukas C, Chen M . Advanced phenotyping and phenotype data analysis for the study of plant growth and development. Front Plant Sci. 2015; 6:619. PMC: 4530591. DOI: 10.3389/fpls.2015.00619. View

2.
Hartmann A, Czauderna T, Hoffmann R, Stein N, Schreiber F . HTPheno: an image analysis pipeline for high-throughput plant phenotyping. BMC Bioinformatics. 2011; 12:148. PMC: 3113939. DOI: 10.1186/1471-2105-12-148. View

3.
Fahlgren N, Gehan M, Baxter I . Lights, camera, action: high-throughput plant phenotyping is ready for a close-up. Curr Opin Plant Biol. 2015; 24:93-9. DOI: 10.1016/j.pbi.2015.02.006. View

4.
OLeary M, Alphonse K, Mariangeles A, Cavaliere D, Cirranello A, Dietterich T . Crowds Replicate Performance of Scientific Experts Scoring Phylogenetic Matrices of Phenotypes. Syst Biol. 2017; 67(1):49-60. DOI: 10.1093/sysbio/syx052. View

5.
Liu J, Endara L, Burleigh J . MatrixConverter: Facilitating construction of phenomic character matrices. Appl Plant Sci. 2015; 3(2). PMC: 4332142. DOI: 10.3732/apps.1400088. View