» Articles » PMID: 31323334

A Demonstration of Unsupervised Machine Learning in Species Delimitation

Overview
Date 2019 Jul 20
PMID 31323334
Citations 19
Authors
Affiliations
Soon will be listed here.
Abstract

One major challenge to delimiting species with genetic data is successfully differentiating population structure from species-level divergence, an issue exacerbated in taxa inhabiting naturally fragmented habitats. Many fields of science are now using machine learning, and in evolutionary biology supervised machine learning has recently been used to infer species boundaries. These supervised methods require training data with associated labels. Conversely, unsupervised machine learning (UML) uses inherent data structure and does not require user-specified training labels, potentially providing more objectivity in species delimitation. In the context of integrative taxonomy, we demonstrate the utility of three UML approaches (random forests, variational autoencoders, t-distributed stochastic neighbor embedding) for species delimitation in an arachnid taxon with high population genetic structure (Opiliones, Laniatores, Metanonychus). We find that UML approaches successfully cluster samples according to species-level divergences and not high levels of population structure, while model-based validation methods severely over-split putative species. UML offers intuitive data visualization in two-dimensional space, the ability to accommodate various data types, and has potential in many areas of systematic and evolutionary biology. We argue that machine learning methods are ideally suited for species delimitation and may perform well in many natural systems and across taxa with diverse biological characteristics.

Citing Articles

Phylogenomics of the rarest animals: a second species of Micrognathozoa identified by machine learning.

Sato S, Appeldorff C, Wangensteen O, Garces-Pastor S, Laumer C, Herranz M Proc Biol Sci. 2025; 292(2041):20242867.

PMID: 39968621 PMC: 11836703. DOI: 10.1098/rspb.2024.2867.


Deforestation-induced Hybridization in Philippine Frogs Creates a Distinct Phenotype With an Inviable Genotype.

Chan K, Hime P, Brown R Heredity (Edinb). 2025; .

PMID: 39956873 DOI: 10.1038/s41437-025-00748-y.


Understanding species limits through the formation of phylogeographic lineages.

Burbrink F, Myers E, Pyron R Ecol Evol. 2024; 14(10):e70263.

PMID: 39364037 PMC: 11446989. DOI: 10.1002/ece3.70263.


Linear Morphometry of Male Genitalia Distinguishes the Ant Genera and (Hymenoptera: Formicidae) in Madagascar.

Rasoarimalala N, Ramiadantsoa T, Rakotonirina J, Fisher B Insects. 2024; 15(8).

PMID: 39194810 PMC: 11354313. DOI: 10.3390/insects15080605.


Long-term climatic stability drives accumulation and maintenance of divergent freshwater fish lineages in a temperate biodiversity hotspot.

Buckley S, Brauer C, Unmack P, Hammer M, Adams M, Beatty S Heredity (Edinb). 2024; 133(3):149-159.

PMID: 38918613 PMC: 11349885. DOI: 10.1038/s41437-024-00700-6.


References
1.
Cordier T, Forster D, Dufresne Y, Martins C, Stoeck T, Pawlowski J . Supervised machine learning outperforms taxonomy-based environmental DNA metabarcoding applied to biomonitoring. Mol Ecol Resour. 2018; 18(6):1381-1391. DOI: 10.1111/1755-0998.12926. View

2.
Smith M, Ruffley M, Espindola A, Tank D, Sullivan J, Carstens B . Demographic model selection using random forests and the site frequency spectrum. Mol Ecol. 2017; 26(17):4562-4573. DOI: 10.1111/mec.14223. View

3.
Ezard T, Pearson P, Purvis A . Algorithmic approaches to aid species' delimitation in multidimensional morphospace. BMC Evol Biol. 2010; 10:175. PMC: 2898690. DOI: 10.1186/1471-2148-10-175. View

4.
Starrett J, Derkarabetian S, Hedin M, Bryson Jr R, McCormack J, Faircloth B . High phylogenetic utility of an ultraconserved element probe set designed for Arachnida. Mol Ecol Resour. 2016; 17(4):812-823. DOI: 10.1111/1755-0998.12621. View

5.
Stamatakis A . RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014; 30(9):1312-3. PMC: 3998144. DOI: 10.1093/bioinformatics/btu033. View