Fast and Sensitive Mapping of Bisulfite-treated Sequencing Data
Overview
Affiliations
Motivation: Cytosine DNA methylation is one of the major epigenetic modifications and influences gene expression, developmental processes, X-chromosome inactivation, and genomic imprinting. Aberrant methylation is furthermore known to be associated with several diseases including cancer. The gold standard to determine DNA methylation on genome-wide scales is 'bisulfite sequencing': DNA fragments are treated with sodium bisulfite resulting in the conversion of unmethylated cytosines into uracils, whereas methylated cytosines remain unchanged. The resulting sequencing reads thus exhibit asymmetric bisulfite-related mismatches and suffer from an effective reduction of the alphabet size in the unmethylated regions, rendering the mapping of bisulfite sequencing reads computationally much more demanding. As a consequence, currently available read mapping software often fails to achieve high sensitivity and in many cases requires unrealistic computational resources to cope with large real-life datasets.
Results: In this study, we present a seed-based approach based on enhanced suffix arrays in conjunction with Myers bit-vector algorithm to efficiently extend seeds to optimal semi-global alignments while allowing for bisulfite-related substitutions. It outperforms most current approaches in terms of sensitivity and performs time-competitive in mapping hundreds of millions of sequencing reads to vertebrate genomes.
Availability: The software segemehl is freely available at http://www.bioinf.uni-leipzig.de/Software/segemehl.
Single-cell sequencing to multi-omics: technologies and applications.
Wu X, Yang X, Dai Y, Zhao Z, Zhu J, Guo H Biomark Res. 2024; 12(1):110.
PMID: 39334490 PMC: 11438019. DOI: 10.1186/s40364-024-00643-4.
Splice_sim: a nucleotide conversion-enabled RNA-seq simulation and evaluation framework.
Popitsch N, Neumann T, von Haeseler A, Ameres S Genome Biol. 2024; 25(1):166.
PMID: 38918865 PMC: 11514792. DOI: 10.1186/s13059-024-03313-8.
Lambda3: homology search for protein, nucleotide, and bisulfite-converted sequences.
Hauswedell H, Hetzel S, Gottlieb S, Kretzmer H, Meissner A, Reinert K Bioinformatics. 2024; 40(3).
PMID: 38485699 PMC: 10955267. DOI: 10.1093/bioinformatics/btae097.
Sestakova S, Salek C, Kundrat D, Cerovska E, Vydra J, Jeziskova I Clin Epigenetics. 2024; 16(1):17.
PMID: 38254139 PMC: 10802002. DOI: 10.1186/s13148-024-01625-x.
Efficiently quantifying DNA methylation for bulk- and single-cell bisulfite data.
Fischer J, Schulz M Bioinformatics. 2023; 39(6).
PMID: 37326968 PMC: 10310462. DOI: 10.1093/bioinformatics/btad386.