» Articles » PMID: 36851917

DNAsmart: Multiple Attribute Ranking Tool for DNA Data Storage Systems

Overview
Specialty Biotechnology
Date 2023 Feb 28
PMID 36851917
Authors
Affiliations
Soon will be listed here.
Abstract

In an ever-growing need for data storage capacity, the Deoxyribonucleic Acid (DNA) molecule gains traction as a new storage medium with a larger capacity, higher density, and a longer lifespan over conventional storage media. To effectively use DNA for data storage, it is important to understand the different methods of encoding information in DNA and compare their effectiveness. This requires evaluating which decoded DNA sequences carry the most encoded information based on various attributes. However, navigating the field of coding theory requires years of experience and domain expertise. For instance, domain experts rely on various mathematical functions and attributes to score and evaluate their encodings. To enable such analytical tasks, we provide an interactive and visual analytical framework for multi-attribute ranking in DNA storage systems. Our framework follows a three-step view with user-settable parameters. It enables users to find the optimal en-/de-coding approaches by setting different weights and combining multiple attributes. We assess the validity of our work through a task-specific user study on domain experts by relying on three tasks. Results indicate that all participants completed their tasks successfully under two minutes, then rated the framework for design choices, perceived usefulness, and intuitiveness. In addition, two real-world use cases are shared and analyzed as direct applications of the proposed tool. DNAsmart enables the ranking of decoded sequences based on multiple attributes. In sum, this work unveils the evaluation of en-/de-coding approaches accessible and tractable through visualization and interactivity to solve comparison and ranking tasks.

Citing Articles

Turbo autoencoders for the DNA data storage channel with Autoturbo-DNA.

Welzel M, Dressler H, Heider D iScience. 2024; 27(5):109575.

PMID: 38638577 PMC: 11024904. DOI: 10.1016/j.isci.2024.109575.


RepairNatrix: a Snakemake workflow for processing DNA sequencing data for DNA storage.

Schwarz P, Welzel M, Heider D, Freisleben B Bioinform Adv. 2024; 3(1):vbad117.

PMID: 38496344 PMC: 10941317. DOI: 10.1093/bioadv/vbad117.


DBTRG: De Bruijn Trim rotation graph encoding for reliable DNA storage.

Zhao Y, Cao B, Wang P, Wang K, Wang B Comput Struct Biotechnol J. 2023; 21:4469-4477.

PMID: 37736298 PMC: 10510065. DOI: 10.1016/j.csbj.2023.09.004.

References
1.
Heider D, Barnekow A . DNA-based watermarks using the DNA-Crypt algorithm. BMC Bioinformatics. 2007; 8:176. PMC: 1904243. DOI: 10.1186/1471-2105-8-176. View

2.
Faircloth B, Glenn T . Not all sequence tags are created equal: designing and validating sequence identification tags robust to indels. PLoS One. 2012; 7(8):e42543. PMC: 3416851. DOI: 10.1371/journal.pone.0042543. View

3.
Heckel R, Mikutis G, Grass R . A Characterization of the DNA Data Storage Channel. Sci Rep. 2019; 9(1):9663. PMC: 6609604. DOI: 10.1038/s41598-019-45832-6. View

4.
Schwarz M, Welzel M, Kabdullayeva T, Becker A, Freisleben B, Heider D . MESA: automated assessment of synthetic DNA fragments and simulation of DNA synthesis, storage, sequencing and PCR errors. Bioinformatics. 2020; 36(11):3322-3326. PMC: 7267826. DOI: 10.1093/bioinformatics/btaa140. View

5.
Hamady M, Walker J, Harris J, Gold N, Knight R . Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nat Methods. 2008; 5(3):235-7. PMC: 3439997. DOI: 10.1038/nmeth.1184. View