» Articles » PMID: 21121038

Penalized Regression for Genome-wide Association Screening of Sequence Data

Overview
Publisher World Scientific
Specialty Biology
Date 2010 Dec 2
PMID 21121038
Citations 19
Authors
Affiliations
Soon will be listed here.
Abstract

Whole exome and whole genome sequencing are likely to be potent tools in the study of common diseases and complex traits. Despite this promise, some very difficult issues in data management and statistical analysis must be squarely faced. The number of rare variants identified by sequencing is apt to be much larger than the number of common variants encountered in current association studies. The low frequencies of rare variants alone will make association testing difficult. This article extends the penalized regression framework for model selection in genome-wide association data to sequencing data with both common and rare variants. Previous research has shown that lasso penalties discourage irrelevant predictors from entering a model. The Euclidean penalties dealt with here group variants by gene or pathway. Pertinent biological information can be incorporated by calibrating penalties by weights. The current paper examines some of the tradeoffs in using pure lasso penalties, pure group penalties, and mixtures of the two types of penalty. All of the computational and statistical advantages of lasso penalized estimation are retained in this richer setting. The overall strategy is implemented in the free statistical genetics analysis software MENDEL and illustrated on both simulated and real data.

Citing Articles

Multivariate genome-wide association analysis by iterative hard thresholding.

Chu B, Ko S, Zhou J, Jensen A, Zhou H, Sinsheimer J Bioinformatics. 2023; 39(4).

PMID: 37067496 PMC: 10133532. DOI: 10.1093/bioinformatics/btad193.


Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity.

Chu B, Keys K, German C, Zhou H, Zhou J, Sobel E Gigascience. 2020; 9(6).

PMID: 32491161 PMC: 7268817. DOI: 10.1093/gigascience/giaa044.


Efficient Signal Inclusion With Genomic Applications.

Jeng X, Zhang T, Tzeng J J Am Stat Assoc. 2020; 114(528):1787-1799.

PMID: 31929665 PMC: 6953619. DOI: 10.1080/01621459.2018.1518236.


OPENMENDEL: a cooperative programming project for statistical genetics.

Zhou H, Sinsheimer J, Bates D, Chu B, German C, Ji S Hum Genet. 2019; 139(1):61-71.

PMID: 30915546 PMC: 6763373. DOI: 10.1007/s00439-019-02001-z.


The Fraction of Rhinovirus Detections Attributable to Mild and Severe Respiratory Illness in a Setting of High Human Immunodeficiency Virus Prevalence, South Africa, 2013-2015.

Hellferscee O, Treurnicht F, Walaza S, du Plessis M, von Gottberg A, Wolter N J Infect Dis. 2018; 219(11):1697-1704.

PMID: 30590585 PMC: 7804373. DOI: 10.1093/infdis/jiy725.


References
1.
Li B, Leal S . Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008; 83(3):311-21. PMC: 2842185. DOI: 10.1016/j.ajhg.2008.06.024. View

2.
Price A, Kryukov G, de Bakker P, Purcell S, Staples J, Wei L . Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet. 2010; 86(6):832-8. PMC: 3032073. DOI: 10.1016/j.ajhg.2010.04.005. View

3.
Hodges E, Xuan Z, Balija V, Kramer M, Molla M, Smith S . Genome-wide in situ exon capture for selective resequencing. Nat Genet. 2007; 39(12):1522-7. DOI: 10.1038/ng.2007.42. View

4.
Ng P, Henikoff S . SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003; 31(13):3812-4. PMC: 168916. DOI: 10.1093/nar/gkg509. View

5.
Sipos A, Rasmussen F, Harrison G, Tynelius P, Lewis G, Leon D . Paternal age and schizophrenia: a population based cohort study. BMJ. 2004; 329(7474):1070. PMC: 526116. DOI: 10.1136/bmj.38243.672396.55. View