» Articles » PMID: 36266455

A Comprehensive Evaluation of Polygenic Score and Genotype Imputation Performances of Human SNP Arrays in Diverse Populations

Overview
Journal Sci Rep
Specialty Science
Date 2022 Oct 20
PMID 36266455
Authors
Affiliations
Soon will be listed here.
Abstract

Regardless of the overwhelming use of next-generation sequencing technologies, microarray-based genotyping combined with the imputation of untyped variants remains a cost-effective means to interrogate genetic variations across the human genome. This technology is widely used in genome-wide association studies (GWAS) at bio-bank scales, and more recently, in polygenic score (PGS) analysis to predict and stratify disease risk. Over the last decade, human genotyping arrays have undergone a tremendous growth in both number and content making a comprehensive evaluation of their performances became more important. Here, we performed a comprehensive performance assessment for 23 available human genotyping arrays in 6 ancestry groups using diverse public and in-house datasets. The analyses focus on performance estimation of derived imputation (in terms of accuracy and coverage) and PGS (in terms of concordance to PGS estimated from whole-genome sequencing data) in three different traits and diseases. We found that the arrays with a higher number of SNPs are not necessarily the ones with higher imputation performance, but the arrays that are well-optimized for the targeted population could provide very good imputation performance. In addition, PGS estimated by imputed SNP array data is highly correlated to PGS estimated by whole-genome sequencing data in most cases. When optimal arrays are used, the correlations of PGS between two types of data are higher than 0.97, but interestingly, arrays with high density can result in lower PGS performance. Our results suggest the importance of properly selecting a suitable genotyping array for PGS applications. Finally, we developed a web tool that provides interactive analyses of tag SNP contents and imputation performance based on population and genomic regions of interest. This study would act as a practical guide for researchers to design their genotyping arrays-based studies. The tool is available at: https://genome.vinbigdata.org/tools/saa/ .

Citing Articles

Bridging genomics' greatest challenge: The diversity gap.

Corpas M, Pius M, Poburennaya M, Guio H, Dwek M, Nagaraj S Cell Genom. 2024; 5(1):100724.

PMID: 39694036 PMC: 11770215. DOI: 10.1016/j.xgen.2024.100724.


Adult Onset Foveomacular Vitelliform Dystrophy Shows Genetic Overlap With Age-Related Macular Degeneration.

Jaskoll S, Kramer A, Elbaz-Hayoun S, Rinsky B, Eandi C, Grunin M Invest Ophthalmol Vis Sci. 2024; 65(13):53.

PMID: 39585675 PMC: 11601137. DOI: 10.1167/iovs.65.13.53.


Commonly used genomic arrays may lose information due to imperfect coverage of discovered variants for autism spectrum disorder.

Yao M, Daniels J, Grosvenor L, Morrill V, Feinberg J, Bakulski K J Neurodev Disord. 2024; 16(1):54.

PMID: 39266988 PMC: 11397030. DOI: 10.1186/s11689-024-09571-8.


Employing emerging technologies such as motion capture to study the complex interplay between genotype and power-related performance traits.

Papadimitriou I Front Physiol. 2024; 15:1407753.

PMID: 38841210 PMC: 11150552. DOI: 10.3389/fphys.2024.1407753.


Recent advances in polygenic scores: translation, equitability, methods and FAIR tools.

Xiang R, Kelemen M, Xu Y, Harris L, Parkinson H, Inouye M Genome Med. 2024; 16(1):33.

PMID: 38373998 PMC: 10875792. DOI: 10.1186/s13073-024-01304-9.


References
1.
Marees A, de Kluiver H, Stringer S, Vorspan F, Curis E, Marie-Claire C . A tutorial on conducting genome-wide association studies: Quality control and statistical analysis. Int J Methods Psychiatr Res. 2018; 27(2):e1608. PMC: 6001694. DOI: 10.1002/mpr.1608. View

2.
Verlouw J, Clemens E, de Vries J, Zolk O, Verkerk A, Am Zehnhoff-Dinnesen A . A comparison of genotyping arrays. Eur J Hum Genet. 2021; 29(11):1611-1624. PMC: 8560858. DOI: 10.1038/s41431-021-00917-7. View

3.
Lewis C, Vassos E . Polygenic risk scores: from research tools to clinical instruments. Genome Med. 2020; 12(1):44. PMC: 7236300. DOI: 10.1186/s13073-020-00742-5. View

4.
Buniello A, MacArthur J, Cerezo M, Harris L, Hayhurst J, Malangone C . The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2018; 47(D1):D1005-D1012. PMC: 6323933. DOI: 10.1093/nar/gky1120. View

5.
Danecek P, Auton A, Abecasis G, Albers C, Banks E, DePristo M . The variant call format and VCFtools. Bioinformatics. 2011; 27(15):2156-8. PMC: 3137218. DOI: 10.1093/bioinformatics/btr330. View