» Articles » PMID: 28821237

Evaluation of the Impact of Illumina Error Correction Tools on De Novo Genome Assembly

Overview
Publisher Biomed Central
Specialty Biology
Date 2017 Aug 20
PMID 28821237
Citations 29
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Recently, many standalone applications have been proposed to correct sequencing errors in Illumina data. The key idea is that downstream analysis tools such as de novo genome assemblers benefit from a reduced error rate in the input data. Surprisingly, a systematic validation of this assumption using state-of-the-art assembly methods is lacking, even for recently published methods.

Results: For twelve recent Illumina error correction tools (EC tools) we evaluated both their ability to correct sequencing errors and their ability to improve de novo genome assembly in terms of contig size and accuracy.

Conclusions: We confirm that most EC tools reduce the number of errors in sequencing data without introducing many new errors. However, we found that many EC tools suffer from poor performance in certain sequence contexts such as regions with low coverage or regions that contain short repeated or low-complexity sequences. Reads overlapping such regions are often ill-corrected in an inconsistent manner, leading to breakpoints in the resulting assemblies that are not present in assemblies obtained from uncorrected data. Resolving this systematic flaw in future EC tools could greatly improve the applicability of such tools.

Citing Articles

Illumina reads correction: evaluation and improvements.

Dlugosz M, Deorowicz S Sci Rep. 2024; 14(1):2232.

PMID: 38278837 PMC: 11222498. DOI: 10.1038/s41598-024-52386-9.


An overlooked phenomenon: complex interactions of potential error sources on the quality of bacterial de novo genome assemblies.

Radai Z, Varadi A, Takacs P, Nagy N, Schmitt N, Prepost E BMC Genomics. 2024; 25(1):45.

PMID: 38195441 PMC: 10777565. DOI: 10.1186/s12864-023-09910-4.


The impact of applying various de novo assembly and correction tools on the identification of genome characterization, drug resistance, and virulence factors of clinical isolates using ONT sequencing.

Safar H, Alatar F, Nasser K, Al-Ajmi R, Alfouzan W, Mustafa A BMC Biotechnol. 2023; 23(1):26.

PMID: 37525145 PMC: 10391896. DOI: 10.1186/s12896-023-00797-3.


SparkEC: speeding up alignment-based DNA error correction tools.

Exposito R, Martinez-Sanchez M, Tourino J BMC Bioinformatics. 2022; 23(1):464.

PMID: 36344928 PMC: 9639292. DOI: 10.1186/s12859-022-05013-1.


CARE 2.0: reducing false-positive sequencing error corrections using machine learning.

Kallenborn F, Cascitti J, Schmidt B BMC Bioinformatics. 2022; 23(1):227.

PMID: 35698033 PMC: 9195321. DOI: 10.1186/s12859-022-04754-3.


References
1.
Kelley D, Schatz M, Salzberg S . Quake: quality-aware detection and correction of sequencing errors. Genome Biol. 2010; 11(11):R116. PMC: 3156955. DOI: 10.1186/gb-2010-11-11-r116. View

2.
Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J . SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2013; 1(1):18. PMC: 3626529. DOI: 10.1186/2047-217X-1-18. View

3.
Miller J, Delcher A, Koren S, Venter E, Walenz B, Brownley A . Aggressive assembly of pyrosequencing reads with mates. Bioinformatics. 2008; 24(24):2818-24. PMC: 2639302. DOI: 10.1093/bioinformatics/btn548. View

4.
Delcher A, Kasif S, Fleischmann R, Peterson J, White O, Salzberg S . Alignment of whole genomes. Nucleic Acids Res. 1999; 27(11):2369-76. PMC: 148804. DOI: 10.1093/nar/27.11.2369. View

5.
Nikolenko S, Korobeynikov A, Alekseyev M . BayesHammer: Bayesian clustering for error correction in single-cell sequencing. BMC Genomics. 2013; 14 Suppl 1:S7. PMC: 3549815. DOI: 10.1186/1471-2164-14-S1-S7. View