» Articles » PMID: 21903627

A Statistical Framework for SNP Calling, Mutation Discovery, Association Mapping and Population Genetical Parameter Estimation from Sequencing Data

Overview
Journal Bioinformatics
Specialty Biology
Date 2011 Sep 10
PMID 21903627
Citations 3183
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty.

Results: We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors.

Availability: http://samtools.sourceforge.net.

Contact: hengli@broadinstitute.org.

Citing Articles

Comparative analysis of genotype imputation strategies for SNPs calling from RNA-seq.

Guo K, Zhong Z, Zeng H, Zhang C, Chitotombe T, Teng J BMC Genomics. 2025; 26(1):245.

PMID: 40082746 PMC: 11907794. DOI: 10.1186/s12864-025-11411-5.


Single-cell RNA-seq data have prevalent blood contamination but can be rescued by Originator, a computational tool separating single-cell RNA-seq by genetic and contextual information.

Unjitwattana T, Huang Q, Yang Y, Tao L, Yang Y, Zhou M Genome Biol. 2025; 26(1):52.

PMID: 40069819 PMC: 11895284. DOI: 10.1186/s13059-025-03495-9.


Integrated functional genomic analysis identifies regulatory variants underlying a major QTL for disease resistance in European sea bass.

Mukiibi R, Ferraresso S, Franch R, Peruzza L, Dalla Rovere G, Babbucci M BMC Biol. 2025; 23(1):75.

PMID: 40069747 PMC: 11899128. DOI: 10.1186/s12915-025-02180-4.


Advantages of Mutant Generation by Genome Rearrangements of Non-Conventional Yeast via Direct Nuclease Transfection.

Oda A, Yasukawa T, Tamura M, Sano A, Masuo N, Ohta K Genes Cells. 2025; 30(2):e70010.

PMID: 40065658 PMC: 11894362. DOI: 10.1111/gtc.70010.


A 6.49-Mb inversion associated with the purple embryo spot trait in potato.

Wang P, Cheng L, Pan J, Ma L, Hu X, Zhang Z aBIOTECH. 2025; 6(1):22-32.

PMID: 40060175 PMC: 11889318. DOI: 10.1007/s42994-025-00197-5.


References
1.
Li H, Durbin R . Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009; 25(14):1754-60. PMC: 2705234. DOI: 10.1093/bioinformatics/btp324. View

2.
Yi X, Liang Y, Huerta-Sanchez E, Jin X, Cuo Z, Pool J . Sequencing of 50 human exomes reveals adaptation to high altitude. Science. 2010; 329(5987):75-8. PMC: 3711608. DOI: 10.1126/science.1190371. View

3.
Robison K . Application of second-generation sequencing to cancer genomics. Brief Bioinform. 2010; 11(5):524-34. DOI: 10.1093/bib/bbq013. View

4.
Pleasance E, Cheetham R, Stephens P, McBride D, Humphray S, Greenman C . A comprehensive catalogue of somatic mutations from a human cancer genome. Nature. 2009; 463(7278):191-6. PMC: 3145108. DOI: 10.1038/nature08658. View

5.
Nielsen R, Paul J, Albrechtsen A, Song Y . Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet. 2011; 12(6):443-51. PMC: 3593722. DOI: 10.1038/nrg2986. View