» Articles » PMID: 29387152

Applications of Random Forest Feature Selection for Fine-scale Genetic Population Assignment

Overview
Journal Evol Appl
Specialty Biology
Date 2018 Feb 2
PMID 29387152
Citations 24
Authors
Affiliations
Soon will be listed here.
Abstract

Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine-learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with ranking for selection of single nucleotide polymorphisms (SNP) for fine-scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon () and a published SNP data set for Alaskan Chinook salmon (). In each species, we identified the minimum panel size required to obtain a self-assignment accuracy of at least 90% using each method to create panels of 50-700 markers Panels of SNPs identified using random forest-based methods performed up to 7.8 and 11.2 percentage points better than -selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self-assignment accuracy ≥90% was obtained with panels of 670 and 384 SNPs for each data set, respectively, a level of accuracy never reached for these species using -selected panels. Our results demonstrate a role for machine-learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations.

Citing Articles

Unlocking the geography of Azobé timber (Lophira alata): revealing spatial genetic structure beyond species boundaries.

Rocha Venancio Meyer-Sand B, Boeschoten L, Bouka G, Ciliane-Madikou J, de Groot G, de Vries N BMC Plant Biol. 2025; 25(1):315.

PMID: 40075285 PMC: 11899005. DOI: 10.1186/s12870-025-06287-2.


Artificial Intelligence (AI)-driven approach to climate action and sustainable development.

Cho H, Ackom E Nat Commun. 2025; 16(1):1228.

PMID: 39890783 PMC: 11785942. DOI: 10.1038/s41467-024-53956-1.


Radiomics based on F-FDG PET for predicting treatment response and prognosis in newly diagnosed diffuse large B-cell lymphoma patients: do lesion selection and segmentation methods matter?.

Zhou Y, Zhou X, Xu Y, Ma X, Tian R Quant Imaging Med Surg. 2025; 15(1):103-120.

PMID: 39839002 PMC: 11744140. DOI: 10.21037/qims-24-585.


Radiomics based on multiple machine learning methods for diagnosing early bone metastases not visible on CT images.

Wang H, Qiu J, Lu W, Xie J, Ma J Skeletal Radiol. 2024; 54(2):335-343.

PMID: 39028463 DOI: 10.1007/s00256-024-04752-x.


PET-based radiomic feature based on the cross-combination method for predicting the mid-term efficacy and prognosis in high-risk diffuse large B-cell lymphoma patients.

Chen M, Rong J, Zhao J, Teng Y, Jiang C, Chen J Front Oncol. 2024; 14:1394450.

PMID: 38903712 PMC: 11188321. DOI: 10.3389/fonc.2024.1394450.


References
1.
Kursa M . Robustness of Random Forest-based gene selection methods. BMC Bioinformatics. 2014; 15:8. PMC: 3897925. DOI: 10.1186/1471-2105-15-8. View

2.
Waples R, Anderson E . Purging putative siblings from population genetic data sets: a cautionary view. Mol Ecol. 2017; 26(5):1211-1224. DOI: 10.1111/mec.14022. View

3.
Meng Y, Yu Y, Cupples L, Farrer L, Lunetta K . Performance of random forest when SNPs are in linkage disequilibrium. BMC Bioinformatics. 2009; 10:78. PMC: 2666661. DOI: 10.1186/1471-2105-10-78. View

4.
Foll M, Gaggiotti O . Identifying the environmental factors that determine the genetic structure of populations. Genetics. 2006; 174(2):875-91. PMC: 1602080. DOI: 10.1534/genetics.106.059451. View

5.
Lemay M, Russello M . Genetic evidence for ecological divergence in kokanee salmon. Mol Ecol. 2015; 24(4):798-811. DOI: 10.1111/mec.13066. View