» Articles » PMID: 37645873

Sigmoni: Classification of Nanopore Signal with a Compressed Pangenome Index

Overview
Journal bioRxiv
Date 2023 Aug 30
PMID 37645873
Authors
Affiliations
Soon will be listed here.
Abstract

Improvements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck of basecalling. But past methods for signal-based classification do not scale efficiently to large, repetitive references like pangenomes, limiting their utility to partial references or individual genomes. We introduce Sigmoni: a rapid, multiclass classification method based on the -index that scales to references of hundreds of Gbps. Sigmoni quantizes nanopore signal into a discrete alphabet of picoamp ranges. It performs rapid, approximate matching using matching statistics, classifying reads based on distributions of picoamp matching statistics and co-linearity statistics. Sigmoni is 10-100× faster than previous methods for adaptive sampling in host depletion experiments with improved accuracy, and can query reads against large microbial or human pangenomes.

References
1.
Li W, ONeill K, Haft D, DiCuccio M, Chetvernin V, Badretdin A . RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation. Nucleic Acids Res. 2020; 49(D1):D1020-D1028. PMC: 7779008. DOI: 10.1093/nar/gkaa1105. View

2.
Payne A, Holmes N, Clarke T, Munro R, Debebe B, Loose M . Readfish enables targeted nanopore sequencing of gigabase-sized genomes. Nat Biotechnol. 2020; 39(4):442-450. PMC: 7610616. DOI: 10.1038/s41587-020-00746-x. View

3.
Sadasivan H, Wadden J, Goliya K, Ranjan P, Dickson R, Blaauw D . Rapid Real-time Squiggle Classification for Read until using RawMap. Arch Clin Biomed Res. 2023; 7(1):45-57. PMC: 10022530. DOI: 10.26502/acbr.50170318. View

4.
Kim D, Song L, Breitwieser F, Salzberg S . Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 2016; 26(12):1721-1729. PMC: 5131823. DOI: 10.1101/gr.210641.116. View

5.
Sayers E, Bolton E, Brister J, Canese K, Chan J, Comeau D . Database resources of the national center for biotechnology information. Nucleic Acids Res. 2021; 50(D1):D20-D26. PMC: 8728269. DOI: 10.1093/nar/gkab1112. View