» Articles » PMID: 20221249

Rapid Assessment of Genetic Ancestry in Populations of Unknown Origin by Genome-wide Genotyping of Pooled Samples

Abstract

As we move forward from the current generation of genome-wide association (GWA) studies, additional cohorts of different ancestries will be studied to increase power, fine map association signals, and generalize association results to additional populations. Knowledge of genetic ancestry as well as population substructure will become increasingly important for GWA studies in populations of unknown ancestry. Here we propose genotyping pooled DNA samples using genome-wide SNP arrays as a viable option to efficiently and inexpensively estimate admixture proportion and identify ancestry informative markers (AIMs) in populations of unknown origin. We constructed DNA pools from African American, Native Hawaiian, Latina, and Jamaican samples and genotyped them using the Affymetrix 6.0 array. Aided by individual genotype data from the African American cohort, we established quality control filters to remove poorly performing SNPs and estimated allele frequencies for the remaining SNPs in each panel. We then applied a regression-based method to estimate the proportion of admixture in each cohort using the allele frequencies estimated from pooling and populations from the International HapMap Consortium as reference panels, and identified AIMs unique to each population. In this study, we demonstrated that genotyping pooled DNA samples yields estimates of admixture proportion that are both consistent with our knowledge of population history and similar to those obtained by genotyping known AIMs. Furthermore, through validation by individual genotyping, we demonstrated that pooling is quite effective for identifying SNPs with large allele frequency differences (i.e., AIMs) and that these AIMs are able to differentiate two closely related populations (HapMap JPT and CHB).

Citing Articles

The Effect of Continuous Selection in KiwiCross Composite Breed on Breed Ancestry and Productivity Performance.

Jaafar M, Harris B, Huson H Animals (Basel). 2025; 15(2).

PMID: 39858175 PMC: 11758328. DOI: 10.3390/ani15020175.


Population structure and breed identification of Chinese indigenous sheep breeds using whole genome SNPs and InDels.

Zhao C, Wang D, Yang C, Chen Y, Teng J, Zhang X Genet Sel Evol. 2024; 56(1):60.

PMID: 39227836 PMC: 11370120. DOI: 10.1186/s12711-024-00927-1.


Definition of metafounders based on population structure analysis.

Anglhuber C, Edel C, Pimentel E, Emmerling R, Gotz K, Thaller G Genet Sel Evol. 2024; 56(1):43.

PMID: 38844876 PMC: 11536677. DOI: 10.1186/s12711-024-00913-7.


Heritability and variance component estimation for feed and water intake behaviors of feedlot cattle.

Dressler E, Shaffer W, Bruno K, Krehbiel C, Calvo-Lorenzo M, Richards C J Anim Sci. 2023; 101.

PMID: 37967310 PMC: 10699840. DOI: 10.1093/jas/skad386.


A Comprehensive Genomic Analysis of Chinese Indigenous Ningxiang Pigs: Genomic Breed Compositions, Runs of Homozygosity, and Beyond.

Yin S, Li Z, Yang F, Guo H, Zhao Q, Zhang Y Int J Mol Sci. 2023; 24(19).

PMID: 37833998 PMC: 10572203. DOI: 10.3390/ijms241914550.


References
1.
Kolonel L, Henderson B, Hankin J, Nomura A, Wilkens L, Pike M . A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics. Am J Epidemiol. 2000; 151(4):346-57. PMC: 4482109. DOI: 10.1093/oxfordjournals.aje.a010213. View

2.
Egyud M, Gajdos Z, Butler J, Tischfield S, Le Marchand L, Kolonel L . Use of weighted reference panels based on empirical estimates of ancestry for capturing untyped variation. Hum Genet. 2009; 125(3):295-303. PMC: 3126674. DOI: 10.1007/s00439-009-0627-8. View

3.
Devlin B, Roeder K . Genomic control for association studies. Biometrics. 2001; 55(4):997-1004. DOI: 10.1111/j.0006-341x.1999.00997.x. View

4.
Zhu X, Zhang S, Zhao H, Cooper R . Association mapping, using a mixture model for complex traits. Genet Epidemiol. 2002; 23(2):181-96. DOI: 10.1002/gepi.210. View

5.
Sham P, Bader J, Craig I, ODonovan M, Owen M . DNA Pooling: a tool for large-scale association studies. Nat Rev Genet. 2002; 3(11):862-71. DOI: 10.1038/nrg930. View