Probabilistic Base Calling of Solexa Sequencing Data
Overview
Authors
Affiliations
Background: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology.
Results: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads.
Conclusion: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.
Optocoder: computational decoding of spatially indexed bead arrays.
Senel E, Rajewsky N, Karaiskos N NAR Genom Bioinform. 2022; 4(2):lqac042.
PMID: 35685220 PMC: 9172073. DOI: 10.1093/nargab/lqac042.
Tumor DNA as a Cancer Biomarker through the Lens of Colorectal Neoplasia.
Cohen J, Diergaarde B, Papadopoulos N, Kinzler K, Schoen R Cancer Epidemiol Biomarkers Prev. 2020; 29(12):2441-2453.
PMID: 33033144 PMC: 7710619. DOI: 10.1158/1055-9965.EPI-20-0549.
How does inflammation drive mutagenesis in colorectal cancer?.
Hsu C, Sowers M, Hsu W, Eyzaguirre E, Qiu S, Chao C Trends Cancer Res. 2018; 12:111-132.
PMID: 30147278 PMC: 6107301.
Mysara M, Njima M, Leys N, Raes J, Monsieurs P Gigascience. 2017; 6(2):1-10.
PMID: 28369460 PMC: 5466709. DOI: 10.1093/gigascience/giw017.
Ambrosini G, Dreos R, Kumar S, Bucher P BMC Genomics. 2016; 17(1):938.
PMID: 27863463 PMC: 5116162. DOI: 10.1186/s12864-016-3288-8.