» Articles » PMID: 22373354

Evaluating Methods for the Analysis of Rare Variants in Sequence Data

Overview
Journal BMC Proc
Publisher Biomed Central
Specialty Biology
Date 2012 Mar 1
PMID 22373354
Citations 26
Authors
Affiliations
Soon will be listed here.
Abstract

A number of rare variant statistical methods have been proposed for analysis of the impending wave of next-generation sequencing data. To date, there are few direct comparisons of these methods on real sequence data. Furthermore, there is a strong need for practical advice on the proper analytic strategies for rare variant analysis. We compare four recently proposed rare variant methods (combined multivariate and collapsing, weighted sum, proportion regression, and cumulative minor allele test) on simulated phenotype and next-generation sequencing data as part of Genetic Analysis Workshop 17. Overall, we find that all analyzed methods have serious practical limitations on identifying causal genes. Specifically, no method has more than a 5% true discovery rate (percentage of truly causal genes among all those identified as significantly associated with the phenotype). Further exploration shows that all methods suffer from inflated false-positive error rates (chance that a noncausal gene will be identified as associated with the phenotype) because of population stratification and gametic phase disequilibrium between noncausal SNPs and causal SNPs. Furthermore, observed true-positive rates (chance that a truly causal gene will be identified as significantly associated with the phenotype) for each of the four methods was very low (<19%). The combination of larger than anticipated false-positive rates, low true-positive rates, and only about 1% of all genes being causal yields poor discriminatory ability for all four methods. Gametic phase disequilibrium and population stratification are important areas for further research in the analysis of rare variant data.

Citing Articles

Hidden secrets of the cancer genome: unlocking the impact of non-coding mutations in gene regulatory elements.

Iniguez-Munoz S, Llinas-Arias P, Ensenyat-Mendez M, Bedoya-Lopez A, Orozco J, Cortes J Cell Mol Life Sci. 2024; 81(1):274.

PMID: 38902506 PMC: 11335195. DOI: 10.1007/s00018-024-05314-z.


Improving the filtering of false positive single nucleotide variations by combining genomic features with quality metrics.

Eren K, Cinar E, Karakurt H, Ozgur A Bioinformatics. 2023; 39(12).

PMID: 38019945 PMC: 10692869. DOI: 10.1093/bioinformatics/btad694.


Quantitative trait locus (xQTL) approaches identify risk genes and drug targets from human non-coding genomes.

Bykova M, Hou Y, Eng C, Cheng F Hum Mol Genet. 2022; 31(R1):R105-R113.

PMID: 36018824 PMC: 9989738. DOI: 10.1093/hmg/ddac208.


Pathway analysis with next-generation sequencing data.

Zhao J, Zhu Y, Boerwinkle E, Xiong M Eur J Hum Genet. 2014; 23(4):507-15.

PMID: 24986826 PMC: 4666565. DOI: 10.1038/ejhg.2014.121.


A method to incorporate prior information into score test for genetic association studies.

Zakharov S, Teoh G, Salim A, Thalamuthu A BMC Bioinformatics. 2014; 15:24.

PMID: 24450486 PMC: 3904928. DOI: 10.1186/1471-2105-15-24.


References
1.
Zawistowski M, Gopalakrishnan S, Ding J, Li Y, Grimm S, Zollner S . Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. Am J Hum Genet. 2010; 87(5):604-17. PMC: 2978957. DOI: 10.1016/j.ajhg.2010.10.012. View

2.
Almasy L, Dyer T, Peralta J, Kent Jr J, Charlesworth J, Curran J . Genetic Analysis Workshop 17 mini-exome simulation. BMC Proc. 2012; 5 Suppl 9:S2. PMC: 3287854. DOI: 10.1186/1753-6561-5-S9-S2. View

3.
Madsen B, Browning S . A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 2009; 5(2):e1000384. PMC: 2633048. DOI: 10.1371/journal.pgen.1000384. View

4.
Morris A, Zeggini E . An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol. 2009; 34(2):188-93. PMC: 2962811. DOI: 10.1002/gepi.20450. View

5.
Lewontin R . On measures of gametic disequilibrium. Genetics. 1988; 120(3):849-52. PMC: 1203562. DOI: 10.1093/genetics/120.3.849. View