» Articles » PMID: 28334151

Strategies for Processing and Quality Control of Illumina Genotyping Arrays

Overview
Journal Brief Bioinform
Specialty Biology
Date 2017 Mar 24
PMID 28334151
Citations 36
Authors
Affiliations
Soon will be listed here.
Abstract

Illumina genotyping arrays have powered thousands of large-scale genome-wide association studies over the past decade. Yet, because of the tremendous volume and complicated genetic assumptions of Illumina genotyping data, processing and quality control (QC) of these data remain a challenge. Thorough QC ensures the accurate identification of single-nucleotide polymorphisms and is required for the correct interpretation of genetic association results. By processing genotyping data on > 100 000 subjects from >10 major Illumina genotyping arrays, we have accumulated extensive experience in handling some of the most peculiar scenarios related to the processing and QC of Illumina genotyping data. Here, we describe strategies for processing Illumina genotyping data from the raw data to an analysis ready format, and we elaborate on the necessary QC procedures required at each processing step. High-quality Illumina genotyping data sets can be obtained by following our detailed QC strategies.

Citing Articles

Transcriptional impacts of substance use disorder and HIV on human ventral midbrain neurons and microglia.

Wilson A, Jacobs M, Lambert T, Valada A, Meloni G, Gilmore E bioRxiv. 2025; .

PMID: 39974894 PMC: 11838593. DOI: 10.1101/2025.02.05.636667.


GWAS links to neuropsychiatric symptoms in mild cognitive impairment and dementia.

Vattathil S, Blostein F, Miller-Fleming T, Davis L, Wingo T, Wingo A medRxiv. 2025; .

PMID: 39974048 PMC: 11838693. DOI: 10.1101/2025.01.31.25321498.


PGSXplorer: an integrated nextflow pipeline for comprehensive quality control and polygenic score model development.

Yaras T, Oktay Y, Karakulah G PeerJ. 2025; 13:e18973.

PMID: 39959831 PMC: 11829630. DOI: 10.7717/peerj.18973.


A brain DNA co-methylation network analysis of psychosis in Alzheimer's disease.

Kouhsar M, Weymouth L, Smith A, Imm J, Bredemeyer C, Wedatilake Y Alzheimers Dement. 2025; 21(2):e14501.

PMID: 39936280 PMC: 11815327. DOI: 10.1002/alz.14501.


Severe COVID-19 disease is associated with genetic factors affecting plasma ACE2 receptor and CRP concentrations.

Vogi V, Haschka D, Forer L, Schwendinger S, Petzer V, Coassin S Sci Rep. 2025; 15(1):4708.

PMID: 39922945 PMC: 11807156. DOI: 10.1038/s41598-025-89306-4.


References
1.
Wang Z, Gerstein M, Snyder M . RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2008; 10(1):57-63. PMC: 2949280. DOI: 10.1038/nrg2484. View

2.
Hindorff L, Sethupathy P, Junkins H, Ramos E, Mehta J, Collins F . Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009; 106(23):9362-7. PMC: 2687147. DOI: 10.1073/pnas.0903103106. View

3.
Price A, Patterson N, Plenge R, Weinblatt M, Shadick N, Reich D . Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006; 38(8):904-9. DOI: 10.1038/ng1847. View

4.
Anderson C, Pettersson F, Clarke G, Cardon L, Morris A, Zondervan K . Data quality control in genetic case-control association studies. Nat Protoc. 2010; 5(9):1564-73. PMC: 3025522. DOI: 10.1038/nprot.2010.116. View

5.
Abecasis G, Auton A, Brooks L, DePristo M, Durbin R, Handsaker R . An integrated map of genetic variation from 1,092 human genomes. Nature. 2012; 491(7422):56-65. PMC: 3498066. DOI: 10.1038/nature11632. View