Accurate Self-correction of Errors in Long Reads Using De Bruijn Graphs

Overview

Journal Bioinformatics

Publisher Oxford University Press

Specialty Biology

Date 2016 Jun 9

PMID 27273673

Citations 49

Authors

Leena Salmela

Riku Walve

Eric Rivals

Esko Ukkonen

Affiliations

Soon will be listed here.

Abstract

Motivation: New long read sequencing technologies, like PacBio SMRT and Oxford NanoPore, can produce sequencing reads up to 50 000 bp long but with an error rate of at least 15%. Reducing the error rate is necessary for subsequent utilization of the reads in, e.g. de novo genome assembly. The error correction problem has been tackled either by aligning the long reads against each other or by a hybrid approach that uses the more accurate short reads produced by second generation sequencing technologies to correct the long reads.

Results: We present an error correction method that uses long reads only. The method consists of two phases: first, we use an iterative alignment-free correction method based on de Bruijn graphs with increasing length of k -mers, and second, the corrected reads are further polished using long-distance dependencies that are found using multiple alignments. According to our experiments, the proposed method is the most accurate one relying on long reads only for read sets with high coverage. Furthermore, when the coverage of the read set is at least 75×, the throughput of the new method is at least 20% higher.

Availability And Implementation: LoRMA is freely available at http://www.cs.helsinki.fi/u/lmsalmel/LoRMA/ .

Contact: leena.salmela@cs.helsinki.fi.

Citing Articles

Repeat and haplotype aware error correction in nanopore sequencing reads with DeChat.

Liu Y, Li Y, Chen E, Xu J, Zhang W, Zeng X Commun Biol. 2024; 7(1):1678.

PMID: 39702496 PMC: 11659559. DOI: 10.1038/s42003-024-07376-y.

Genome assembly in the telomere-to-telomere era.

Li H, Durbin R Nat Rev Genet. 2024; 25(9):658-670.

PMID: 38649458 DOI: 10.1038/s41576-024-00718-w.

Hybrid-hybrid correction of errors in long reads with HERO.

Kang X, Xu J, Luo X, Schonhuth A Genome Biol. 2023; 24(1):275.

PMID: 38041098 PMC: 10690975. DOI: 10.1186/s13059-023-03112-7.

Application of third-generation sequencing in cancer research.

Chen Z, He X Med Rev (2021). 2023; 1(2):150-171.

PMID: 37724303 PMC: 10388785. DOI: 10.1515/mr-2021-0013.

Applications of long-read sequencing to Mendelian genetics.

Mastrorosa F, Miller D, Eichler E Genome Med. 2023; 15(1):42.

PMID: 37316925 PMC: 10266321. DOI: 10.1186/s13073-023-01194-3.

References

Laehnemann D, Borkhardt A, McHardy A . Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief Bioinform. 2015; 17(1):154-79. PMC: 4719071. DOI: 10.1093/bib/bbv029. View

Ono Y, Asai K, Hamada M . PBSIM: PacBio reads simulator--toward accurate genome assembly. Bioinformatics. 2012; 29(1):119-21. DOI: 10.1093/bioinformatics/bts649. View

Salmela L, Rivals E . LoRDEC: accurate and efficient long read error correction. Bioinformatics. 2014; 30(24):3506-14. PMC: 4253826. DOI: 10.1093/bioinformatics/btu538. View

Hackl T, Hedrich R, Schultz J, Forster F . proovread: large-scale high-accuracy PacBio correction through iterative short read consensus. Bioinformatics. 2014; 30(21):3004-11. PMC: 4609002. DOI: 10.1093/bioinformatics/btu392. View

Laver T, Harrison J, ONeill P, Moore K, Farbos A, Paszkiewicz K . Assessing the performance of the Oxford Nanopore Technologies MinION. Biomol Detect Quantif. 2016; 3:1-8. PMC: 4691839. DOI: 10.1016/j.bdq.2015.02.001. View

Yang X, Chockalingam S, Aluru S . A survey of error-correction methods for next-generation sequencing. Brief Bioinform. 2012; 14(1):56-66. DOI: 10.1093/bib/bbs015. View

Madoui M, Engelen S, Cruaud C, Belser C, Bertrand L, Alberti A . Genome assembly using Nanopore-guided long and error-free DNA reads. BMC Genomics. 2015; 16:327. PMC: 4460631. DOI: 10.1186/s12864-015-1519-z. View

Bankevich A, Nurk S, Antipov D, Gurevich A, Dvorkin M, Kulikov A . SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012; 19(5):455-77. PMC: 3342519. DOI: 10.1089/cmb.2012.0021. View

Lee C, Grasso C, Sharlow M . Multiple sequence alignment using partial order graphs. Bioinformatics. 2002; 18(3):452-64. DOI: 10.1093/bioinformatics/18.3.452. View

10.

Chin C, Alexander D, Marks P, Klammer A, Drake J, Heiner C . Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013; 10(6):563-9. DOI: 10.1038/nmeth.2474. View

11.

Au K, Underwood J, Lee L, Wong W . Improving PacBio long read accuracy by short read alignment. PLoS One. 2012; 7(10):e46679. PMC: 3464235. DOI: 10.1371/journal.pone.0046679. View

12.

Berlin K, Koren S, Chin C, Drake J, Landolin J, Phillippy A . Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol. 2015; 33(6):623-30. DOI: 10.1038/nbt.3238. View

13.

Koren S, Phillippy A . One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr Opin Microbiol. 2014; 23:110-20. DOI: 10.1016/j.mib.2014.11.014. View

14.

Salmela L, Schroder J . Correcting errors in short reads by multiple alignments. Bioinformatics. 2011; 27(11):1455-61. DOI: 10.1093/bioinformatics/btr170. View

15.

Schirmer M, Ijaz U, DAmore R, Hall N, Sloan W, Quince C . Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res. 2015; 43(6):e37. PMC: 4381044. DOI: 10.1093/nar/gku1341. View

16.

Chaisson M, Tesler G . Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics. 2012; 13:238. PMC: 3572422. DOI: 10.1186/1471-2105-13-238. View

17.

Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y . Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 2011; 39(13):e90. PMC: 3141275. DOI: 10.1093/nar/gkr344. View

18.

Koren S, Schatz M, Walenz B, Martin J, Howard J, Ganapathy G . Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol. 2012; 30(7):693-700. PMC: 3707490. DOI: 10.1038/nbt.2280. View

19.

Drezen E, Rizk G, Chikhi R, Deltel C, Lemaitre C, Peterlongo P . GATB: Genome Assembly & Analysis Tool Box. Bioinformatics. 2014; 30(20):2959-61. PMC: 4184257. DOI: 10.1093/bioinformatics/btu406. View

20.

Miclotte G, Heydari M, Demeester P, Rombauts S, Van de Peer Y, Audenaert P . Jabba: hybrid error correction for long sequencing reads. Algorithms Mol Biol. 2016; 11:10. PMC: 4855726. DOI: 10.1186/s13015-016-0075-7. View