Evaluation of the Impact of Illumina Error Correction Tools on De Novo Genome Assembly

Overview

Journal BMC Bioinformatics

Publisher Biomed Central

Specialty Biology

Date 2017 Aug 20

PMID 28821237

Citations 29

Authors

Mahdi Heydari

Giles Miclotte

Piet Demeester

Yves Van de Peer

Jan Fostier

Affiliations

Soon will be listed here.

Abstract

Background: Recently, many standalone applications have been proposed to correct sequencing errors in Illumina data. The key idea is that downstream analysis tools such as de novo genome assemblers benefit from a reduced error rate in the input data. Surprisingly, a systematic validation of this assumption using state-of-the-art assembly methods is lacking, even for recently published methods.

Results: For twelve recent Illumina error correction tools (EC tools) we evaluated both their ability to correct sequencing errors and their ability to improve de novo genome assembly in terms of contig size and accuracy.

Conclusions: We confirm that most EC tools reduce the number of errors in sequencing data without introducing many new errors. However, we found that many EC tools suffer from poor performance in certain sequence contexts such as regions with low coverage or regions that contain short repeated or low-complexity sequences. Reads overlapping such regions are often ill-corrected in an inconsistent manner, leading to breakpoints in the resulting assemblies that are not present in assemblies obtained from uncorrected data. Resolving this systematic flaw in future EC tools could greatly improve the applicability of such tools.

Citing Articles

Illumina reads correction: evaluation and improvements.

Dlugosz M, Deorowicz S Sci Rep. 2024; 14(1):2232.

PMID: 38278837 PMC: 11222498. DOI: 10.1038/s41598-024-52386-9.

An overlooked phenomenon: complex interactions of potential error sources on the quality of bacterial de novo genome assemblies.

Radai Z, Varadi A, Takacs P, Nagy N, Schmitt N, Prepost E BMC Genomics. 2024; 25(1):45.

PMID: 38195441 PMC: 10777565. DOI: 10.1186/s12864-023-09910-4.

The impact of applying various de novo assembly and correction tools on the identification of genome characterization, drug resistance, and virulence factors of clinical isolates using ONT sequencing.

Safar H, Alatar F, Nasser K, Al-Ajmi R, Alfouzan W, Mustafa A BMC Biotechnol. 2023; 23(1):26.

PMID: 37525145 PMC: 10391896. DOI: 10.1186/s12896-023-00797-3.

SparkEC: speeding up alignment-based DNA error correction tools.

Exposito R, Martinez-Sanchez M, Tourino J BMC Bioinformatics. 2022; 23(1):464.

PMID: 36344928 PMC: 9639292. DOI: 10.1186/s12859-022-05013-1.

CARE 2.0: reducing false-positive sequencing error corrections using machine learning.

Kallenborn F, Cascitti J, Schmidt B BMC Bioinformatics. 2022; 23(1):227.

PMID: 35698033 PMC: 9195321. DOI: 10.1186/s12859-022-04754-3.

References

Kelley D, Schatz M, Salzberg S . Quake: quality-aware detection and correction of sequencing errors. Genome Biol. 2010; 11(11):R116. PMC: 3156955. DOI: 10.1186/gb-2010-11-11-r116. View

Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J . SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2013; 1(1):18. PMC: 3626529. DOI: 10.1186/2047-217X-1-18. View

Miller J, Delcher A, Koren S, Venter E, Walenz B, Brownley A . Aggressive assembly of pyrosequencing reads with mates. Bioinformatics. 2008; 24(24):2818-24. PMC: 2639302. DOI: 10.1093/bioinformatics/btn548. View

Delcher A, Kasif S, Fleischmann R, Peterson J, White O, Salzberg S . Alignment of whole genomes. Nucleic Acids Res. 1999; 27(11):2369-76. PMC: 148804. DOI: 10.1093/nar/27.11.2369. View

Nikolenko S, Korobeynikov A, Alekseyev M . BayesHammer: Bayesian clustering for error correction in single-cell sequencing. BMC Genomics. 2013; 14 Suppl 1:S7. PMC: 3549815. DOI: 10.1186/1471-2164-14-S1-S7. View

Song L, Florea L, Langmead B . Lighter: fast and memory-efficient sequencing error correction without counting. Genome Biol. 2014; 15(11):509. PMC: 4248469. DOI: 10.1186/s13059-014-0509-9. View

Compeau P, Pevzner P, Tesler G . How to apply de Bruijn graphs to genome assembly. Nat Biotechnol. 2011; 29(11):987-91. PMC: 5531759. DOI: 10.1038/nbt.2023. View

Yang X, Chockalingam S, Aluru S . A survey of error-correction methods for next-generation sequencing. Brief Bioinform. 2012; 14(1):56-66. DOI: 10.1093/bib/bbs015. View

Sheikhizadeh S, de Ridder D . ACE: accurate correction of errors using K-mer tries. Bioinformatics. 2015; 31(19):3216-8. DOI: 10.1093/bioinformatics/btv332. View

10.

Greenfield P, Duesing K, Papanicolaou A, Bauer D . Blue: correcting sequencing errors using consensus and context. Bioinformatics. 2014; 30(19):2723-32. DOI: 10.1093/bioinformatics/btu368. View

11.

Zerbino D, Birney E . Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008; 18(5):821-9. PMC: 2336801. DOI: 10.1101/gr.074492.107. View

12.

Marcais G, Kingsford C . A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011; 27(6):764-70. PMC: 3051319. DOI: 10.1093/bioinformatics/btr011. View

13.

Marcais G, Yorke J, Zimin A . QuorUM: An Error Corrector for Illumina Reads. PLoS One. 2015; 10(6):e0130821. PMC: 4471408. DOI: 10.1371/journal.pone.0130821. View

14.

Heo Y, Wu X, Chen D, Ma J, Hwu W . BLESS: bloom filter-based error correction solution for high-throughput sequencing reads. Bioinformatics. 2014; 30(10):1354-62. PMC: 6365934. DOI: 10.1093/bioinformatics/btu030. View

15.

Laehnemann D, Borkhardt A, McHardy A . Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief Bioinform. 2015; 17(1):154-79. PMC: 4719071. DOI: 10.1093/bib/bbv029. View

16.

Simpson J, Durbin R . Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 2011; 22(3):549-56. PMC: 3290790. DOI: 10.1101/gr.126953.111. View

17.

Conway T, Wazny J, Bromage A, Zobel J, Beresford-Smith B . Gossamer--a resource-efficient de novo assembler. Bioinformatics. 2012; 28(14):1937-8. DOI: 10.1093/bioinformatics/bts297. View

18.

Ilie L, Molnar M . RACER: Rapid and accurate correction of errors in reads. Bioinformatics. 2013; 29(19):2490-3. DOI: 10.1093/bioinformatics/btt407. View

19.

Li H . BFC: correcting Illumina sequencing errors. Bioinformatics. 2015; 31(17):2885-7. PMC: 4635656. DOI: 10.1093/bioinformatics/btv290. View

20.

Simpson J, Wong K, Jackman S, Schein J, Jones S, Birol I . ABySS: a parallel assembler for short read sequence data. Genome Res. 2009; 19(6):1117-23. PMC: 2694472. DOI: 10.1101/gr.089532.108. View