» Articles » PMID: 30565316

System for Quality-Assured Data Analysis: Flexible, Reproducible Scientific Workflows

Overview
Journal Genet Epidemiol
Specialties Genetics
Public Health
Date 2018 Dec 20
PMID 30565316
Citations 4
Authors
Affiliations
Soon will be listed here.
Abstract

The reproducibility of scientific processes is one of the paramount problems of bioinformatics, an engineering problem that must be addressed to perform good research. The System for Quality-Assured Data Analysis (SyQADA), described here, seeks to address reproducibility by managing many of the details of procedural bookkeeping in bioinformatics in as simple and transparent a manner as possible. SyQADA has been used by persons with backgrounds ranging from expert programmer to Unix novice, to perform and repeat dozens of diverse bioinformatics workflows on tens of thousands of samples, consuming over 80 CPU-months of computing on over 300,000 individual tasks of scores of projects on laptops, computer servers, and computing clusters. SyQADA is especially well-suited for paired-sample analyses found in cancer tumor-normal studies. SyQADA executable source code, documentation, tutorial examples, and workflows used in our lab is available from http://scheet.org/software.html.

Citing Articles

Developing a systematic approach to assessing data quality in secondary use of clinical data based on intended use.

Razzaghi H, Greenberg J, Bailey L Learn Health Syst. 2022; 6(1):e10264.

PMID: 35036548 PMC: 8753309. DOI: 10.1002/lrh2.10264.


Evaluation of serverless computing for scalable execution of a joint variant calling workflow.

John A, Muenzen K, Ausmees K PLoS One. 2021; 16(7):e0254363.

PMID: 34242357 PMC: 8270184. DOI: 10.1371/journal.pone.0254363.


Inherited causes of clonal haematopoiesis in 97,691 whole genomes.

Bick A, Weinstock J, Nandakumar S, Fulco C, Bao E, Zekavat S Nature. 2020; 586(7831):763-768.

PMID: 33057201 PMC: 7944936. DOI: 10.1038/s41586-020-2819-2.


Large-scale analysis of acquired chromosomal alterations in non-tumor samples from patients with cancer.

Jakubek Y, Chang K, Sivakumar S, Yu Y, Giordano M, Fowler J Nat Biotechnol. 2019; 38(1):90-96.

PMID: 31685958 PMC: 8082517. DOI: 10.1038/s41587-019-0297-6.

References
1.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M, Bender D . PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007; 81(3):559-75. PMC: 1950838. DOI: 10.1086/519795. View

2.
Kuehn H, Liberzon A, Reich M, Mesirov J . Using GenePattern for gene expression analysis. Curr Protoc Bioinformatics. 2008; Chapter 7:7.12.1-7.12.39. PMC: 3893799. DOI: 10.1002/0471250953.bi0712s22. View

3.
Korn J, Kuruvilla F, McCarroll S, Wysoker A, Nemesh J, Cawley S . Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet. 2008; 40(10):1253-60. PMC: 2756534. DOI: 10.1038/ng.237. View

4.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A . The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010; 20(9):1297-303. PMC: 2928508. DOI: 10.1101/gr.107524.110. View

5.
Goecks J, Nekrutenko A, Taylor J . Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010; 11(8):R86. PMC: 2945788. DOI: 10.1186/gb-2010-11-8-r86. View