» Articles » PMID: 39082244

The Changing Landscape of Text Mining: a Review of Approaches for Ecology and Evolution

Overview
Journal Proc Biol Sci
Specialty Biology
Date 2024 Jul 31
PMID 39082244
Authors
Affiliations
Soon will be listed here.
Abstract

In ecology and evolutionary biology, the synthesis and modelling of data from published literature are commonly used to generate insights and test theories across systems. However, the tasks of searching, screening, and extracting data from literature are often arduous. Researchers may manually process hundreds to thousands of articles for systematic reviews, meta-analyses, and compiling synthetic datasets. As relevant articles expand to tens or hundreds of thousands, computer-based approaches can increase the efficiency, transparency and reproducibility of literature-based research. Methods available for text mining are rapidly changing owing to developments in machine learning-based language models. We review the growing landscape of approaches, mapping them onto three broad paradigms (frequency-based approaches, traditional Natural Language Processing and deep learning-based language models). This serves as an entry point to learn foundational and cutting-edge concepts, vocabularies, and methods to foster integration of these tools into ecological and evolutionary research. We cover approaches for modelling ecological texts, generating training data, developing custom models and interacting with large language models and discuss challenges and possible solutions to implementing these methods in ecology and evolution.

Citing Articles

Evaluating the feasibility of automating dataset retrieval for biodiversity monitoring.

Fuster-Calvo A, Valentin S, Tamayo W, Gravel D PeerJ. 2025; 13:e18853.

PMID: 39897501 PMC: 11786708. DOI: 10.7717/peerj.18853.

References
1.
Farrell M, Brierley L, Willoughby A, Yates A, Mideo N . Past and future uses of text mining in ecology and evolution. Proc Biol Sci. 2022; 289(1975):20212721. PMC: 9114983. DOI: 10.1098/rspb.2021.2721. View

2.
Norouzzadeh M, Nguyen A, Kosmala M, Swanson A, Palmer M, Packer C . Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proc Natl Acad Sci U S A. 2018; 115(25):E5716-E5725. PMC: 6016780. DOI: 10.1073/pnas.1719367115. View

3.
HaCohen-Kerner Y, Miller D, Yigal Y . The influence of preprocessing on text classification using a bag-of-words representation. PLoS One. 2020; 15(5):e0232525. PMC: 7194364. DOI: 10.1371/journal.pone.0232525. View

4.
Ratner A, Bach S, Ehrenberg H, Fries J, Wu S, Re C . Snorkel: rapid training data creation with weak supervision. VLDB J. 2020; 29(2):709-730. PMC: 7075849. DOI: 10.1007/s00778-019-00552-1. View

5.
Thessen A, Parr C . Knowledge extraction and semantic annotation of text from the encyclopedia of life. PLoS One. 2014; 9(3):e89550. PMC: 3940440. DOI: 10.1371/journal.pone.0089550. View