» Articles » PMID: 25431634

From FastQ Data to High Confidence Variant Calls: the Genome Analysis Toolkit Best Practices Pipeline

Abstract

This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK.

Citing Articles

The Impact of SNP Score on Low-Density Lipoprotein Cholesterol Concentration and Coronary Artery Disease.

cereskevicius D, ciapiene I, Aldujeli A, Zabiela V, Lesauskaite V, Zubieliene K Int J Mol Sci. 2025; 26(5).

PMID: 40076958 PMC: 11899937. DOI: 10.3390/ijms26052337.


Whole-Exome Sequencing Identifies Novel GATA5/6 Variants in Right-Sided Congenital Heart Defects.

Zodanu G, Hwang J, Mudery J, Sisniega C, Kang X, Wang L Int J Mol Sci. 2025; 26(5).

PMID: 40076735 PMC: 11901071. DOI: 10.3390/ijms26052115.


SNUH methylation classifier for CNS tumors.

Lee K, Jeon J, Park J, Yu S, Won J, Kim K Clin Epigenetics. 2025; 17(1):47.

PMID: 40075518 PMC: 11905536. DOI: 10.1186/s13148-025-01824-0.


Factors underlying migratory timing of a seasonally migrating bird.

Bobowski T, Bossu C, Rueda-Hernandez R, Schweizer T, Tello-Lopez I, Smith T Sci Rep. 2025; 15(1):8527.

PMID: 40075156 PMC: 11903693. DOI: 10.1038/s41598-025-93442-2.


High-resolution genome assembly and population genetic study of the endangered maple (Sapindaceae): implications for conservation strategies.

Li X, Jiang L, Deng H, Yu Q, Ju W, Chen X Hortic Res. 2025; 12(4):uhae357.

PMID: 40066161 PMC: 11891484. DOI: 10.1093/hr/uhae357.


References
1.
Sherry S, Ward M, Kholodov M, Baker J, Phan L, Smigielski E . dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2000; 29(1):308-11. PMC: 29783. DOI: 10.1093/nar/29.1.308. View

2.
Mills R, Luttig C, Larkins C, Beauchamp A, Tsui C, Pittard W . An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 2006; 16(9):1182-90. PMC: 1557762. DOI: 10.1101/gr.4565806. View

3.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N . The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009; 25(16):2078-9. PMC: 2723002. DOI: 10.1093/bioinformatics/btp352. View

4.
DePristo M, Banks E, Poplin R, Garimella K, Maguire J, Hartl C . A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011; 43(5):491-8. PMC: 3083463. DOI: 10.1038/ng.806. View

5.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A . The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010; 20(9):1297-303. PMC: 2928508. DOI: 10.1101/gr.107524.110. View