» Articles » PMID: 20950446

Discriminant Analysis of Principal Components: a New Method for the Analysis of Genetically Structured Populations

Overview
Journal BMC Genet
Publisher Biomed Central
Date 2010 Oct 19
PMID 20950446
Citations 1543
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The dramatic progress in sequencing technologies offers unprecedented prospects for deciphering the organization of natural populations in space and time. However, the size of the datasets generated also poses some daunting challenges. In particular, Bayesian clustering algorithms based on pre-defined population genetics models such as the STRUCTURE or BAPS software may not be able to cope with this unprecedented amount of data. Thus, there is a need for less computer-intensive approaches. Multivariate analyses seem particularly appealing as they are specifically devoted to extracting information from large datasets. Unfortunately, currently available multivariate methods still lack some essential features needed to study the genetic structure of natural populations.

Results: We introduce the Discriminant Analysis of Principal Components (DAPC), a multivariate method designed to identify and describe clusters of genetically related individuals. When group priors are lacking, DAPC uses sequential K-means and model selection to infer genetic clusters. Our approach allows extracting rich information from genetic data, providing assignment of individuals to groups, a visual assessment of between-population differentiation, and contribution of individual alleles to population structuring. We evaluate the performance of our method using simulated data, which were also analyzed using STRUCTURE as a benchmark. Additionally, we illustrate the method by analyzing microsatellite polymorphism in worldwide human populations and hemagglutinin gene sequence variation in seasonal influenza.

Conclusions: Analysis of simulated data revealed that our approach performs generally better than STRUCTURE at characterizing population subdivision. The tools implemented in DAPC for the identification of clusters and graphical representation of between-group structures allow to unravel complex population structures. Our approach is also faster than Bayesian clustering algorithms by several orders of magnitude, and may be applicable to a wider range of datasets.

Citing Articles

Transmission of ceftazidime-avibactam-resistant among pets, veterinarians and animal hospital environment.

Dai H, Shao D, Song Y, An Q, Zhang Z, Zhang H Biosaf Health. 2025; 6(3):191-198.

PMID: 40078730 PMC: 11895028. DOI: 10.1016/j.bsheal.2024.03.004.


A Scoping Review of Infrared Spectroscopy and Machine Learning Methods for Head and Neck Precancer and Cancer Diagnosis and Prognosis.

Alajaji S, Sabzian R, Wang Y, Sultan A, Wang R Cancers (Basel). 2025; 17(5).

PMID: 40075644 PMC: 11899414. DOI: 10.3390/cancers17050796.


Determining population structure from k-mer frequencies.

Hrytsenko Y, Daniels N, Schwartz R PeerJ. 2025; 13:e18939.

PMID: 40061228 PMC: 11890038. DOI: 10.7717/peerj.18939.


Population Structure of the Invasive Asian Tiger Mosquito, , in Europe.

Corley M, Cosme L, Armbruster P, Beebe N, Bega A, Boyer S Ecol Evol. 2025; 15(3):e71009.

PMID: 40060725 PMC: 11886418. DOI: 10.1002/ece3.71009.


Examining the coupling relationship between industrial upgrading and eco-environmental system in resource-based cities in China.

Lei Y, Chen Y, Zhang L, Lu Y Front Public Health. 2025; 13:1527306.

PMID: 40027498 PMC: 11868083. DOI: 10.3389/fpubh.2025.1527306.


References
1.
Nei M . Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci U S A. 1973; 70(12):3321-3. PMC: 427228. DOI: 10.1073/pnas.70.12.3321. View

2.
Smith D, Lapedes A, de Jong J, Bestebroer T, Rimmelzwaan G, Osterhaus A . Mapping the antigenic and genetic evolution of influenza virus. Science. 2004; 305(5682):371-6. DOI: 10.1126/science.1097211. View

3.
Corander J, Marttinen P, Siren J, Tang J . Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinformatics. 2008; 9:539. PMC: 2629778. DOI: 10.1186/1471-2105-9-539. View

4.
Balloux F . EASYPOP (version 1.7): a computer program for population genetics simulations. J Hered. 2001; 92(3):301-2. DOI: 10.1093/jhered/92.3.301. View

5.
Reyment R . The statistical analysis of multivariate serological frequency data. Bull Math Biol. 2005; 67(6):1303-13. DOI: 10.1016/j.bulm.2005.02.002. View