SHREC: a Short-read Error Correction Method
Overview
Affiliations
Motivation: Second-generation sequencing technologies produce a massive amount of short reads in a single experiment. However, sequencing errors can cause major problems when using this approach for de novo sequencing applications. Moreover, existing error correction methods have been designed and optimized for shotgun sequencing. Therefore, there is an urgent need for the design of fast and accurate computational methods and tools for error correction of large amounts of short read data.
Results: We present SHREC, a new algorithm for correcting errors in short-read data that uses a generalized suffix trie on the read data as the underlying data structure. Our results show that the method can identify erroneous reads with sensitivity and specificity of over 99% and 96% for simulated data with error rates of up to 3% as well as for real data. Furthermore, it achieves an error correction accuracy of over 80% for simulated data and over 88% for real data. These results are clearly superior to previously published approaches. SHREC is available as an efficient open-source Java implementation that allows processing of 10 million of short reads on a standard workstation.
SparkEC: speeding up alignment-based DNA error correction tools.
Exposito R, Martinez-Sanchez M, Tourino J BMC Bioinformatics. 2022; 23(1):464.
PMID: 36344928 PMC: 9639292. DOI: 10.1186/s12859-022-05013-1.
A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.
Das A, Goswami S, Lee K, Park S BMC Genomics. 2019; 20(Suppl 11):948.
PMID: 31856721 PMC: 6923905. DOI: 10.1186/s12864-019-6286-9.
Athena: Automated Tuning of k-mer based Genomic Error Correction Algorithms using Language Models.
Abdallah M, Mahgoub A, Ahmed H, Chaterji S Sci Rep. 2019; 9(1):16157.
PMID: 31695060 PMC: 6834855. DOI: 10.1038/s41598-019-52196-4.
GAAP: A Genome Assembly + Annotation Pipeline.
Kong J, Huh S, Won J, Yoon J, Kim B, Kim K Biomed Res Int. 2019; 2019:4767354.
PMID: 31346518 PMC: 6617929. DOI: 10.1155/2019/4767354.
SNPs detection by eBWT positional clustering.
Prezza N, Pisanti N, Sciortino M, Rosone G Algorithms Mol Biol. 2019; 14:3.
PMID: 30839919 PMC: 6364478. DOI: 10.1186/s13015-019-0137-8.