» Articles » PMID: 31014224

Text-mined Fossil Biodiversity Dynamics Using Machine Learning

Overview
Journal Proc Biol Sci
Specialty Biology
Date 2019 Apr 25
PMID 31014224
Citations 5
Authors
Affiliations
Soon will be listed here.
Abstract

Documented occurrences of fossil taxa are the empirical foundation for understanding large-scale biodiversity changes and evolutionary dynamics in deep time. The fossil record contains vast amounts of understudied taxa. Yet the compilation of huge volumes of data remains a labour-intensive impediment to a more complete understanding of Earth's biodiversity history. Even so, many occurrence records of species and genera in these taxa can be uncovered in the palaeontological literature. Here, we extract observations of fossils and their inferred ages from unstructured text in books and scientific articles using machine-learning approaches. We use Bryozoa, a group of marine invertebrates with a rich fossil record, as a case study. Building on recent advances in computational linguistics, we develop a pipeline to recognize taxonomic names and geologic time intervals in published literature and use supervised learning to machine-read whether the species in question occurred in a given age interval. Intermediate machine error rates appear comparable to human error rates in a simple trial, and resulting genus richness curves capture the main features of published fossil diversity studies of bryozoans. We believe our automated pipeline, that greatly reduced the time required to compile our dataset, can help others compile similar data for other taxa.

Citing Articles

Evaluating the feasibility of automating dataset retrieval for biodiversity monitoring.

Fuster-Calvo A, Valentin S, Tamayo W, Gravel D PeerJ. 2025; 13:e18853.

PMID: 39897501 PMC: 11786708. DOI: 10.7717/peerj.18853.


Automated graptolite identification at high taxonomic resolution using residual networks.

Niu Z, Jia S, Xu H iScience. 2024; 27(1):108549.

PMID: 38213629 PMC: 10783601. DOI: 10.1016/j.isci.2023.108549.


From beasts to bytes: Revolutionizing zoological research with artificial intelligence.

Zhang Y, Luo Z, Sun Y, Liu J, Chen Z Zool Res. 2023; 44(6):1115-1131.

PMID: 37933101 PMC: 10802096. DOI: 10.24272/j.issn.2095-8137.2023.263.


Challenges and directions in analytical paleobiology.

Dillon E, Dunne E, Womack T, Kouvari M, Larina E, Claytor J Paleobiology. 2023; 49(3):377-393.

PMID: 37809321 PMC: 7615171. DOI: 10.1017/pab.2023.3.


Enhancing georeferenced biodiversity inventories: automated information extraction from literature records reveal the gaps.

Kopperud B, Lidgard S, Liow L PeerJ. 2022; 10:e13921.

PMID: 35999848 PMC: 9393005. DOI: 10.7717/peerj.13921.


References
1.
Silvestro D, Schnitzler J, Liow L, Antonelli A, Salamin N . Bayesian estimation of speciation and extinction from incomplete fossil occurrence data. Syst Biol. 2014; 63(3):349-67. PMC: 4361715. DOI: 10.1093/sysbio/syu006. View

2.
Sepkoski D . Towards "a natural history of data": evolving practices and epistemologies of data in paleontology, 1800-2000. J Hist Biol. 2012; 46(3):401-44. DOI: 10.1007/s10739-012-9336-6. View

3.
Peters S, Zhang C, Livny M, Re C . A machine reading system for assembling synthetic paleontological databases. PLoS One. 2014; 9(12):e113523. PMC: 4250071. DOI: 10.1371/journal.pone.0113523. View

4.
Percha B, Garten Y, Altman R . Discovery and explanation of drug-drug interactions via text mining. Pac Symp Biocomput. 2011; :410-21. PMC: 3345566. View

5.
Hochreiter S, Schmidhuber J . Long short-term memory. Neural Comput. 1997; 9(8):1735-80. DOI: 10.1162/neco.1997.9.8.1735. View