» Articles » PMID: 29976145

BARCOSEL: a Tool for Selecting an Optimal Barcode Set for High-throughput Sequencing

Overview
Publisher Biomed Central
Specialty Biology
Date 2018 Jul 7
PMID 29976145
Citations 22
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Current high-throughput sequencing platforms provide capacity to sequence multiple samples in parallel. Different samples are labeled by attaching a short sample specific nucleotide sequence, barcode, to each DNA molecule prior pooling them into a mix containing a number of libraries to be sequenced simultaneously. After sequencing, the samples are binned by identifying the barcode sequence within each sequence read. In order to tolerate sequencing errors, barcodes should be sufficiently apart from each other in sequence space. An additional constraint due to both nucleotide usage and basecalling accuracy is that the proportion of different nucleotides should be in balance in each barcode position. The number of samples to be mixed in each sequencing run may vary and this introduces a problem how to select the best subset of available barcodes at sequencing core facility for each sequencing run. There are plenty of tools available for de novo barcode design, but they are not suitable for subset selection.

Results: We have developed a tool which can be used for three different tasks: 1) selecting an optimal barcode set from a larger set of candidates, 2) checking the compatibility of user-defined set of barcodes, e.g. whether two or more libraries with existing barcodes can be combined in a single sequencing pool, and 3) augmenting an existing set of barcodes. In our approach the selection process is formulated as a minimization problem. We define the cost function and a set of constraints and use integer programming to solve the resulting combinatorial problem. Based on the desired number of barcodes to be selected and the set of candidate sequences given by user, the necessary constraints are automatically generated and the optimal solution can be found. The method is implemented in C programming language and web interface is available at http://ekhidna2.biocenter.helsinki.fi/barcosel .

Conclusions: Increasing capacity of sequencing platforms raises the challenge of mixing barcodes. Our method allows the user to select a given number of barcodes among the larger existing barcode set so that both sequencing errors are tolerated and the nucleotide balance is optimized. The tool is easy to access via web browser.

Citing Articles

Automated Design of Oligopools and Rapid Analysis of Massively Parallel Barcoded Measurements.

Hossain A, Cetnar D, LaFleur T, McLellan J, Salis H ACS Synth Biol. 2024; 13(12):4218-4232.

PMID: 39641628 PMC: 11669329. DOI: 10.1021/acssynbio.4c00661.


NucBalancer: streamlining barcode sequence selection for optimal sample pooling for sequencing.

Gupta S, Sharma A GigaByte. 2024; 2024:gigabyte138.

PMID: 39430727 PMC: 11488490. DOI: 10.46471/gigabyte.138.


Differential expression analysis identifies a prognostically significant extracellular matrix-enriched gene signature in hyaluronan-positive clear cell renal cell carcinoma.

Jokelainen O, Rintala T, Fortino V, Pasonen-Seppanen S, Sironen R, Nykopp T Sci Rep. 2024; 14(1):10626.

PMID: 38724670 PMC: 11082176. DOI: 10.1038/s41598-024-61426-3.


Highly Multiplexed Reverse-Transcription Loop-Mediated Isothermal Amplification and Nanopore Sequencing (LAMPore) for Wastewater-Based Surveillance.

Kang S, Choi P, Maile-Moskowitz A, Brown C, Gonzalez R, Pruden A ACS ES T Water. 2024; 4(4):1629-1636.

PMID: 38633369 PMC: 11019537. DOI: 10.1021/acsestwater.3c00690.


Intra- a novel system to identify mutations that cause protein misfolding.

Quan N, Eguchi Y, Geiler-Samerotte K Front Genet. 2023; 14:1198203.

PMID: 37745845 PMC: 10512024. DOI: 10.3389/fgene.2023.1198203.


References
1.
Buschmann T, Bystrykh L . Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinformatics. 2013; 14:272. PMC: 3853030. DOI: 10.1186/1471-2105-14-272. View

2.
Faircloth B, Glenn T . Not all sequence tags are created equal: designing and validating sequence identification tags robust to indels. PLoS One. 2012; 7(8):e42543. PMC: 3416851. DOI: 10.1371/journal.pone.0042543. View

3.
Herten K, Hestand M, Vermeesch J, Van Houdt J . GBSX: a toolkit for experimental design and demultiplexing genotyping by sequencing experiments. BMC Bioinformatics. 2015; 16:73. PMC: 4359581. DOI: 10.1186/s12859-015-0514-3. View

4.
Buschmann T . DNABarcodes: an R package for the systematic construction of DNA sample tags. Bioinformatics. 2017; 33(6):920-922. DOI: 10.1093/bioinformatics/btw759. View

5.
Bystrykh L . Generalized DNA barcode design based on Hamming codes. PLoS One. 2012; 7(5):e36852. PMC: 3355179. DOI: 10.1371/journal.pone.0036852. View