» Articles » PMID: 19661376

BayesCall: A Model-based Base-calling Algorithm for High-throughput Short-read Sequencing

Overview
Journal Genome Res
Specialty Genetics
Date 2009 Aug 8
PMID 19661376
Citations 44
Authors
Affiliations
Soon will be listed here.
Abstract

Extracting sequence information from raw images of fluorescence is the foundation underlying several high-throughput sequencing platforms. Some of the main challenges associated with this technology include reducing the error rate, assigning accurate base-specific quality scores, and reducing the cost of sequencing by increasing the throughput per run. To demonstrate how computational advancement can help to meet these challenges, a novel model-based base-calling algorithm, BayesCall, is introduced for the Illumina sequencing platform. Being founded on the tools of statistical learning, BayesCall is flexible enough to incorporate various features of the sequencing process. In particular, it can easily incorporate time-dependent parameters and model residual effects. This new approach significantly improves the accuracy over Illumina's base-caller Bustard, particularly in the later cycles of a sequencing run. For 76-cycle data on a standard viral sample, phiX174, BayesCall improves Bustard's average per-base error rate by approximately 51%. The probability of observing each base can be readily computed in BayesCall, and this probability can be transformed into a useful base-specific quality score with a high discrimination ability. A detailed study of BayesCall's performance is presented here.

Citing Articles

Soil Microbial Community Characteristics and Their Effect on Tea Quality under Different Fertilization Treatments in Two Tea Plantations.

Lei Y, Ding D, Duan J, Luo Y, Huang F, Kang Y Genes (Basel). 2024; 15(5).

PMID: 38790239 PMC: 11121415. DOI: 10.3390/genes15050610.


BEERS2: RNA-Seq simulation through high fidelity in silico modeling.

Brooks T, Lahens N, Mrcela A, Sarantopoulou D, Nayak S, Naik A Brief Bioinform. 2024; 25(3).

PMID: 38605641 PMC: 11009461. DOI: 10.1093/bib/bbae164.


Optocoder: computational decoding of spatially indexed bead arrays.

Senel E, Rajewsky N, Karaiskos N NAR Genom Bioinform. 2022; 4(2):lqac042.

PMID: 35685220 PMC: 9172073. DOI: 10.1093/nargab/lqac042.


Spatial transcriptomic and single-nucleus analysis reveals heterogeneity in a gigantic single-celled syncytium.

Gerber T, Loureiro C, Schramma N, Chen S, Jain A, Weber A Elife. 2022; 11.

PMID: 35195068 PMC: 8865844. DOI: 10.7554/eLife.69745.


Comparative Analysis of Soil Microbiome Profiles in the Companion Planting of White Clover and Orchard Grass Using 16S rRNA Gene Sequencing Data.

Chen L, Li D, Shao Y, Adni J, Wang H, Liu Y Front Plant Sci. 2020; 11:538311.

PMID: 33042174 PMC: 7530175. DOI: 10.3389/fpls.2020.538311.


References
1.
Rougemont J, Amzallag A, Iseli C, Farinelli L, Xenarios I, Naef F . Probabilistic base calling of Solexa sequencing data. BMC Bioinformatics. 2008; 9:431. PMC: 2575221. DOI: 10.1186/1471-2105-9-431. View

2.
Li L, Speed T . An estimate of the crosstalk matrix in four-dye fluorescence-based DNA sequencing. Electrophoresis. 1999; 20(7):1433-42. DOI: 10.1002/(SICI)1522-2683(19990601)20:7<1433::AID-ELPS1433>3.0.CO;2-0. View

3.
Yin Z, Severin J, Giddings M, Huang W, Westphall M, Smith L . Automatic matrix determination in four dye fluorescence-based DNA sequencing. Electrophoresis. 1996; 17(6):1143-50. DOI: 10.1002/elps.1150170626. View

4.
Bentley D . Whole-genome re-sequencing. Curr Opin Genet Dev. 2006; 16(6):545-52. DOI: 10.1016/j.gde.2006.10.009. View

5.
Brockman W, Alvarez P, Young S, Garber M, Giannoukos G, Lee W . Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Res. 2008; 18(5):763-70. PMC: 2336812. DOI: 10.1101/gr.070227.107. View