» Articles » PMID: 24594988

Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life

Overview
Journal PLoS One
Date 2014 Mar 6
PMID 24594988
Citations 9
Authors
Affiliations
Soon will be listed here.
Abstract

Numerous digitization and ontological initiatives have focused on translating biological knowledge from narrative text to machine-readable formats. In this paper, we describe two workflows for knowledge extraction and semantic annotation of text data objects featured in an online biodiversity aggregator, the Encyclopedia of Life. One workflow tags text with DBpedia URIs based on keywords. Another workflow finds taxon names in text using GNRD for the purpose of building a species association network. Both workflows work well: the annotation workflow has an F1 Score of 0.941 and the association algorithm has an F1 Score of 0.885. Existing text annotators such as Terminizer and DBpedia Spotlight performed well, but require some optimization to be useful in the ecology and evolution domain. Important future work includes scaling up and improving accuracy through the use of distributional semantics.

Citing Articles

The changing landscape of text mining: a review of approaches for ecology and evolution.

Farrell M, Le Guillarme N, Brierley L, Hunter B, Scheepens D, Willoughby A Proc Biol Sci. 2024; 291(2027):20240423.

PMID: 39082244 PMC: 11289731. DOI: 10.1098/rspb.2024.0423.


Past and future uses of text mining in ecology and evolution.

Farrell M, Brierley L, Willoughby A, Yates A, Mideo N Proc Biol Sci. 2022; 289(1975):20212721.

PMID: 35582795 PMC: 9114983. DOI: 10.1098/rspb.2021.2721.


Biodiversity Observations Miner: A web application to unlock primary biodiversity data from published literature.

Munoz G, Kissling W, van Loon E Biodivers Data J. 2019; (7):e28737.

PMID: 30692868 PMC: 6344444. DOI: 10.3897/BDJ.7.e28737.


Semi-automatic Extraction of Plants Morphological Characters from Taxonomic Descriptions Written in Spanish.

Mora M, Araya J Biodivers Data J. 2018; (6):e21282.

PMID: 29991903 PMC: 6030177. DOI: 10.3897/BDJ.6.e21282.


The importance of digitized biocollections as a source of trait data and a new VertNet resource.

Guralnick R, Zermoglio P, Wieczorek J, LaFrance R, Bloom D, Russell L Database (Oxford). 2016; 2016.

PMID: 28025346 PMC: 5199146. DOI: 10.1093/database/baw158.


References
1.
Smoot M, Ono K, Ruscheinski J, Wang P, Ideker T . Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2010; 27(3):431-2. PMC: 3031041. DOI: 10.1093/bioinformatics/btq675. View

2.
Agosti D, Egloff W . Taxonomic information exchange and copyright: the Plazi approach. BMC Res Notes. 2009; 2:53. PMC: 2673227. DOI: 10.1186/1756-0500-2-53. View

3.
Miller J, Dikow T, Agosti D, Sautter G, Catapano T, Penev L . From taxonomic literature to cybertaxonomic content. BMC Biol. 2012; 10:87. PMC: 3485131. DOI: 10.1186/1741-7007-10-87. View

4.
Ananiadou S, Kell D, Tsujii J . Text mining and its potential applications in systems biology. Trends Biotechnol. 2006; 24(12):571-9. DOI: 10.1016/j.tibtech.2006.10.002. View

5.
Mungall C, Torniai C, Gkoutos G, Lewis S, Haendel M . Uberon, an integrative multi-species anatomy ontology. Genome Biol. 2012; 13(1):R5. PMC: 3334586. DOI: 10.1186/gb-2012-13-1-r5. View