» Articles » PMID: 26661113

Exploiting Linkage Disequilibrium for Ultrahigh-Dimensional Genome-Wide Data with an Integrated Statistical Approach

Overview
Journal Genetics
Specialty Genetics
Date 2015 Dec 15
PMID 26661113
Citations 3
Authors
Affiliations
Soon will be listed here.
Abstract

Genome-wide data with millions of single-nucleotide polymorphisms (SNPs) can be highly correlated due to linkage disequilibrium (LD). The ultrahigh dimensionality of big data brings unprecedented challenges to statistical modeling such as noise accumulation, the curse of dimensionality, computational burden, spurious correlations, and a processing and storing bottleneck. The traditional statistical approaches lose their power due to [Formula: see text] (n is the number of observations and p is the number of SNPs) and the complex correlation structure among SNPs. In this article, we propose an integrated distance correlation ridge regression (DCRR) approach to accommodate the ultrahigh dimensionality, joint polygenic effects of multiple loci, and the complex LD structures. Initially, a distance correlation (DC) screening approach is used to extensively remove noise, after which LD structure is addressed using a ridge penalized multiple logistic regression (LRR) model. The false discovery rate, true positive discovery rate, and computational cost were simultaneously assessed through a large number of simulations. A binary trait of Arabidopsis thaliana, the hypersensitive response to the bacterial elicitor AvrRpm1, was analyzed in 84 inbred lines (28 susceptibilities and 56 resistances) with 216,130 SNPs. Compared to previous SNP discovery methods implemented on the same data set, the DCRR approach successfully detected the causative SNP while dramatically reducing spurious associations and computational time.

Citing Articles

Statistical Learning Methods Applicable to Genome-Wide Association Studies on Unbalanced Case-Control Disease Data.

Dai X, Fu G, Zhao S, Zeng Y Genes (Basel). 2021; 12(5).

PMID: 34068248 PMC: 8153154. DOI: 10.3390/genes12050736.


Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions.

Lees J, Mai T, Galardini M, Wheeler N, Horsfield S, Parkhill J mBio. 2020; 11(4).

PMID: 32636251 PMC: 7343994. DOI: 10.1128/mBio.01344-20.


Detecting PCOS susceptibility loci from genome-wide association studies via iterative trend correlation based feature screening.

Dai X, Fu G, Reese R BMC Bioinformatics. 2020; 21(1):177.

PMID: 32366216 PMC: 7199379. DOI: 10.1186/s12859-020-3492-z.

References
1.
Jorde L . Linkage disequilibrium and the search for complex disease genes. Genome Res. 2000; 10(10):1435-44. DOI: 10.1101/gr.144500. View

2.
Cardon L, Bell J . Association study designs for complex diseases. Nat Rev Genet. 2001; 2(2):91-9. DOI: 10.1038/35052543. View

3.
Zavattari P, Lampis R, Motzo C, Loddo M, MULARGIA A, Whalen M . Conditional linkage disequilibrium analysis of a complex disease superlocus, IDDM1 in the HLA region, reveals the presence of independent modifying gene effects influencing the type 1 diabetes risk encoded by the major HLA-DQB1, -DRB1 disease loci. Hum Mol Genet. 2001; 10(8):881-9. DOI: 10.1093/hmg/10.8.881. View

4.
Reich D, Cargill M, Bolk S, Ireland J, Sabeti P, Richter D . Linkage disequilibrium in the human genome. Nature. 2001; 411(6834):199-204. DOI: 10.1038/35075590. View

5.
Pritchard J, Przeworski M . Linkage disequilibrium in humans: models and data. Am J Hum Genet. 2001; 69(1):1-14. PMC: 1226024. DOI: 10.1086/321275. View