» Articles » PMID: 23558742

The Subread Aligner: Fast, Accurate and Scalable Read Mapping by Seed-and-vote

Overview
Specialty Biochemistry
Date 2013 Apr 6
PMID 23558742
Citations 1579
Authors
Affiliations
Soon will be listed here.
Abstract

Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multi-seed strategy, called seed-and-vote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from the seeds. It uses a relatively large number of short seeds (called subreads) extracted from each read and allows all the seeds to vote on the optimal location. When the read length is <160 bp, overlapping subreads are used. More conventional alignment algorithms are then used to fill in detailed mismatch and indel information between the subreads that make up the winning voting block. The strategy is fast because the overall genomic location has already been chosen before the detailed alignment is done. It is sensitive because no individual subread is required to map exactly, nor are individual subreads constrained to map close by other subreads. It is accurate because the final location must be supported by several different subreads. The strategy extends easily to find exon junctions, by locating reads that contain sets of subreads mapping to different exons of the same gene. It scales up efficiently for longer reads.

Citing Articles

Deciphering Colorectal Cancer-Hepatocyte Interactions: A Multiomics Platform for Interrogation of Metabolic Crosstalk in the Liver-Tumor Microenvironment.

Nelson A, Reese L, Rono E, Queathem E, Qiu Y, McCluskey B Int J Mol Sci. 2025; 26(5).

PMID: 40076609 PMC: 11900982. DOI: 10.3390/ijms26051976.


Endometrium-derived organoids from cystic fibrosis patients and mice as new models to study disease-associated endometrial pathobiology.

De Pauw E, Gommers B, Ensinck M, Timmerman S, De Vriendt S, Bueds C Cell Mol Life Sci. 2025; 82(1):109.

PMID: 40074868 PMC: 11904040. DOI: 10.1007/s00018-025-05627-7.


Temporal Transcriptomic Differences in Stroke Between Diabetic and Non-Diabetic Mice.

Lv Y, Dong X, Xi Y, Zhan F, Mao Y, Wu J J Mol Neurosci. 2025; 75(1):31.

PMID: 40053254 DOI: 10.1007/s12031-025-02327-6.


TFIIH kinase CDK7 drives cell proliferation through a common core transcription factor network.

Jones T, Feng J, Luyties O, Cozzolino K, Sanford L, Rimel J Sci Adv. 2025; 11(9):eadr9660.

PMID: 40020069 PMC: 11870056. DOI: 10.1126/sciadv.adr9660.


Ethylene promotes SMAX1 accumulation to inhibit arbuscular mycorrhiza symbiosis.

Das D, Varshney K, Ogawa S, Torabi S, Huttl R, Nelson D Nat Commun. 2025; 16(1):2025.

PMID: 40016206 PMC: 11868565. DOI: 10.1038/s41467-025-57222-w.


References
1.
McCarthy D, Chen Y, Smyth G . Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 2012; 40(10):4288-97. PMC: 3378882. DOI: 10.1093/nar/gks042. View

2.
Li H, Homer N . A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform. 2010; 11(5):473-83. PMC: 2943993. DOI: 10.1093/bib/bbq015. View

3.
Shi W, Oshlack A, Smyth G . Optimizing the noise versus bias trade-off for Illumina whole genome expression BeadChips. Nucleic Acids Res. 2010; 38(22):e204. PMC: 3001098. DOI: 10.1093/nar/gkq871. View

4.
Homer N, Merriman B, Nelson S . BFAST: an alignment tool for large scale genome resequencing. PLoS One. 2009; 4(11):e7767. PMC: 2770639. DOI: 10.1371/journal.pone.0007767. View

5.
Huang W, Li L, Myers J, Marth G . ART: a next-generation sequencing read simulator. Bioinformatics. 2011; 28(4):593-4. PMC: 3278762. DOI: 10.1093/bioinformatics/btr708. View