» Articles » PMID: 26177965

Karect: Accurate Correction of Substitution, Insertion and Deletion Errors for Next-generation Sequencing Data

Overview
Journal Bioinformatics
Specialty Biology
Date 2015 Jul 17
PMID 26177965
Citations 40
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Next-generation sequencing generates large amounts of data affected by errors in the form of substitutions, insertions or deletions of bases. Error correction based on the high-coverage information, typically improves de novo assembly. Most existing tools can correct substitution errors only; some support insertions and deletions, but accuracy in many cases is low.

Results: We present Karect, a novel error correction technique based on multiple alignment. Our approach supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome. Experiments with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate that Karect is more accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain) and post de novo assembly quality (up to 10% increase in NGA50). We also introduce an improved framework for evaluating the quality of error correction.

Availability And Implementation: Karect is available at: http://aminallam.github.io/karect.

Contact: amin.allam@kaust.edu.sa

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

MAFcounter: An efficient tool for counting the occurrences of k-mers in MAF files.

Patsakis M, Provatas K, Mouratidis I, Georgakopoulos-Soares I ArXiv. 2024; .

PMID: 39650609 PMC: 11623707.


A survey of k-mer methods and applications in bioinformatics.

Moeckel C, Mareboina M, Konnaris M, Chan C, Mouratidis I, Montgomery A Comput Struct Biotechnol J. 2024; 23:2289-2303.

PMID: 38840832 PMC: 11152613. DOI: 10.1016/j.csbj.2024.05.025.


Identification of diverse RNA viruses in flagellates (Euglenozoa: Trypanosomatidae: Blastocrithidiinae).

Grybchuk D, Galan A, Klocek D, Macedo D, Wolf Y, Votypka J Virus Evol. 2024; 10(1):veae037.

PMID: 38774311 PMC: 11108086. DOI: 10.1093/ve/veae037.


Beauty in the beast - Placozoan biodiversity explored through molluscan predator genomics.

Eitel M, Osigus H, Brenzinger B, Worheide G Ecol Evol. 2024; 14(4):e11220.

PMID: 38606341 PMC: 11007570. DOI: 10.1002/ece3.11220.


MAC-ErrorReads: machine learning-assisted classifier for filtering erroneous NGS reads.

Sami A, El-Metwally S, Rashad M BMC Bioinformatics. 2024; 25(1):61.

PMID: 38321434 PMC: 10848413. DOI: 10.1186/s12859-024-05681-1.