» Articles » PMID: 20097913

Correcting Population Stratification in Genetic Association Studies Using a Phylogenetic Approach

Overview
Journal Bioinformatics
Specialty Biology
Date 2010 Jan 26
PMID 20097913
Citations 21
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: The rapid development of genotyping technology and extensive cataloguing of single nucleotide polymorphisms (SNPs) across the human genome have made genetic association studies the mainstream for gene mapping of complex human diseases. For many diseases, the most practical approach is the population-based design with unrelated individuals. Although having the advantages of easier sample collection and greater power than family-based designs, unrecognized population stratification in the study samples can lead to both false-positive and false-negative findings and might obscure the true association signals if not appropriately corrected.

Methods: We report PHYLOSTRAT, a new method that corrects for population stratification by combining phylogeny constructed from SNP genotypes and principal coordinates from multi-dimensional scaling (MDS) analysis. This hybrid approach efficiently captures both discrete and admixed population structures.

Results: By extensive simulations, the analysis of a synthetic genome-wide association dataset created using data from the Human Genome Diversity Project, and the analysis of a lactase-height dataset, we show that our method can correct for population stratification more efficiently than several existing population stratification correction methods, including EIGENSTRAT, a hybrid approach based on MDS and clustering, and STRATSCORE , in terms of requiring fewer random SNPs for inference of population structure. By combining the flexibility and hierarchical nature of phylogenetic trees with the advantage of representing admixture using MDS, our hybrid approach can capture the complex population structures in human populations effectively.

Software Availability: Codes can be downloaded from http://people.pcbi.upenn.edu/ approximately lswang/phylostrat/

Contact: mingyao@upenn.edu; iswang@upenn.edu.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Determining population structure from k-mer frequencies.

Hrytsenko Y, Daniels N, Schwartz R PeerJ. 2025; 13:e18939.

PMID: 40061228 PMC: 11890038. DOI: 10.7717/peerj.18939.


Alcohol consumption and allergic diseases: Mendelian randomization evidence from China.

Zhu C, Beatty T, Li Y, Chen G, Zhao Q, Chen Q Glob Health Action. 2025; 17(1):2442788.

PMID: 39838956 PMC: 11755739. DOI: 10.1080/16549716.2024.2442788.


Unlocking genetic diversity for low-input systems in a changing climate through participatory characterization and GWAS of lentil landraces.

Lorenzetti E, Macharia M, Mager S, DellAcqua M, Carlesi S, Barberi P Sci Rep. 2024; 14(1):31979.

PMID: 39738775 PMC: 11685781. DOI: 10.1038/s41598-024-83516-y.


Expression profiles of east-west highly differentiated genes in Uyghur genomes.

Ning Z, Tan X, Yuan Y, Huang K, Pan Y, Tian L Natl Sci Rev. 2023; 10(4):nwad077.

PMID: 37138773 PMC: 10150800. DOI: 10.1093/nsr/nwad077.


Mendelian Randomization and the Environmental Epigenetics of Health: a Systematic Review.

Grau-Perez M, Agha G, Pang Y, Bermudez J, Tellez-Plaza M Curr Environ Health Rep. 2019; 6(1):38-51.

PMID: 30773605 DOI: 10.1007/s40572-019-0226-3.


References
1.
Balding D, Nichols R . A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica. 1995; 96(1-2):3-12. DOI: 10.1007/BF01441146. View

2.
Pritchard J, Rosenberg N . Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet. 1999; 65(1):220-8. PMC: 1378093. DOI: 10.1086/302449. View

3.
Campbell C, Ogburn E, Lunetta K, Lyon H, Freedman M, Groop L . Demonstrating stratification in a European American population. Nat Genet. 2005; 37(8):868-72. DOI: 10.1038/ng1607. View

4.
Studier J, Keppler K . A note on the neighbor-joining algorithm of Saitou and Nei. Mol Biol Evol. 1988; 5(6):729-31. DOI: 10.1093/oxfordjournals.molbev.a040527. View

5.
Luca D, Ringquist S, Klei L, Lee A, Gieger C, Wichmann H . On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants. Am J Hum Genet. 2008; 82(2):453-63. PMC: 2427172. DOI: 10.1016/j.ajhg.2007.11.003. View