» Articles » PMID: 22809341

Analysis of Context-dependent Errors for Illumina Sequencing

Overview
Specialty Biology
Date 2012 Jul 20
PMID 22809341
Citations 9
Authors
Affiliations
Soon will be listed here.
Abstract

The new generation of short-read sequencing technologies requires reliable measures of data quality. Such measures are especially important for variant calling. However, in the particular case of SNP calling, a great number of false-positive SNPs may be obtained. One needs to distinguish putative SNPs from sequencing or other errors. We found that not only the probability of sequencing errors (i.e. the quality value) is important to distinguish an FP-SNP but also the conditional probability of "correcting" this error (the "second best call" probability, conditional on that of the first call). Surprisingly, around 80% of mismatches can be "corrected" with this second call. Another way to reduce the rate of FP-SNPs is to retrieve DNA motifs that seem to be prone to sequencing errors, and to attach a corresponding conditional quality value to these motifs. We have developed several measures to distinguish between sequence errors and candidate SNPs, based on a base call's nucleotide context and its mismatch type. In addition, we suggested a simple method to correct the majority of mismatches, based on conditional probability of their "second" best intensity call. We attach a corresponding second call confidence (quality value) of being corrected to each mismatch.

Citing Articles

A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads.

Zhang W, Huang N, Zheng J, Liao X, Wang J, Li H Genes (Basel). 2019; 10(1).

PMID: 30646604 PMC: 6356754. DOI: 10.3390/genes10010044.


Valection: design optimization for validation and verification studies.

Cooper C, Yao D, Sendorek D, Yamaguchi T, Png C, Houlahan K BMC Bioinformatics. 2018; 19(1):339.

PMID: 30253747 PMC: 6157051. DOI: 10.1186/s12859-018-2391-z.


Tackling critical parameters in metazoan meta-barcoding experiments: a preliminary study based on DNA barcode.

Balech B, Sandionigi A, Manzari C, Trucchi E, Tullo A, Licciulli F PeerJ. 2018; 6:e4845.

PMID: 29915686 PMC: 6004112. DOI: 10.7717/peerj.4845.


HIV-1 and HIV-2 exhibit similar mutation frequencies and spectra in the absence of G-to-A hypermutation.

Rawson J, Landman S, Reilly C, Mansky L Retrovirology. 2015; 12:60.

PMID: 26160407 PMC: 4496919. DOI: 10.1186/s12977-015-0180-6.


ViVaMBC: estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering.

Verbist B, Clement L, Reumers J, Thys K, Vapirev A, Talloen W BMC Bioinformatics. 2015; 16:59.

PMID: 25887734 PMC: 4369097. DOI: 10.1186/s12859-015-0458-7.