» Articles » PMID: 28845445

An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values

Overview
Date 2017 Aug 29
PMID 28845445
Citations 5
Authors
Affiliations
Soon will be listed here.
Abstract

This paper provides the specification and an initial validation of an evaluation framework for the comparison of lossy compressors of genome sequencing quality values. The goal is to define reference data, test sets, tools and metrics that shall be used to evaluate the impact of lossy compression of quality values on human genome variant calling. The functionality of the framework is validated referring to two state-of-the-art genomic compressors. This work has been spurred by the current activity within the ISO/IEC SC29/WG11 technical committee (a.k.a. MPEG), which is investigating the possibility of starting a standardization activity for genomic information representation.

Citing Articles

Navigating bottlenecks and trade-offs in genomic data analysis.

Berger B, Yu Y Nat Rev Genet. 2022; 24(4):235-250.

PMID: 36476810 PMC: 10204111. DOI: 10.1038/s41576-022-00551-z.


CMIC: an efficient quality score compressor with random access functionality.

Chen H, Chen J, Lu Z, Wang R BMC Bioinformatics. 2022; 23(1):294.

PMID: 35870880 PMC: 9308261. DOI: 10.1186/s12859-022-04837-1.


MZPAQ: a FASTQ data compression tool.

El Allali A, Arshad M Source Code Biol Med. 2019; 14:3.

PMID: 31171931 PMC: 6547476. DOI: 10.1186/s13029-019-0073-5.


Systematic benchmarking of omics computational tools.

Mangul S, Martin L, Hill B, Lam A, Distler M, Zelikovsky A Nat Commun. 2019; 10(1):1393.

PMID: 30918265 PMC: 6437167. DOI: 10.1038/s41467-019-09406-4.


CALQ: compression of quality values of aligned sequencing data.

Voges J, Ostermann J, Hernaez M Bioinformatics. 2017; 34(10):1650-1658.

PMID: 29186284 PMC: 5946873. DOI: 10.1093/bioinformatics/btx737.

References
1.
Ochoa I, Asnani H, Bharadia D, Chowdhury M, Weissman T, Yona G . QualComp: a new lossy compressor for quality scores based on rate distortion theory. BMC Bioinformatics. 2013; 14:187. PMC: 3698011. DOI: 10.1186/1471-2105-14-187. View

2.
Bonfield J, Mahoney M . Compression of FASTQ and SAM format sequencing data. PLoS One. 2013; 8(3):e59190. PMC: 3606433. DOI: 10.1371/journal.pone.0059190. View

3.
Yu Y, Yorukoglu D, Peng J, Berger B . Quality score compression improves genotyping accuracy. Nat Biotechnol. 2015; 33(3):240-3. PMC: 4439189. DOI: 10.1038/nbt.3170. View

4.
Kahn S . On the future of genomic data. Science. 2011; 331(6018):728-9. DOI: 10.1126/science.1197891. View

5.
Kozanitis C, Saunders C, Kruglyak S, Bafna V, Varghese G . Compressing genomic sequence fragments using SlimGene. J Comput Biol. 2011; 18(3):401-13. PMC: 3123913. DOI: 10.1089/cmb.2010.0253. View