» Articles » PMID: 28792971

Optimized Detection of Insertions/deletions (INDELs) in Whole-exome Sequencing Data

Overview
Journal PLoS One
Date 2017 Aug 10
PMID 28792971
Citations 14
Authors
Affiliations
Soon will be listed here.
Abstract

Insertion and deletion (INDEL) mutations, the most common type of structural variance, are associated with several human diseases. The detection of INDELs through next-generation sequencing (NGS) is becoming more common due to the decrease in costs, the increase in efficiency, and sensitivity improvements demonstrated by the various sequencing platforms and analytical tools. However, there are still many errors associated with INDEL variant calling, and distinguishing INDELs from errors in NGS remains challenging. To evaluate INDEL calling from whole-exome sequencing (WES) data, we performed Sanger sequencing for all INDELs called from the several calling algorithm. We compared the performance of the four algorithms (i.e. GATK, SAMtools, Dindel, and Freebayes) for INDEL detection from the same sample. We examined the sensitivity and PPV of GATK (90.2 and 89.5%, respectively), SAMtools (75.3 and 94.4%, respectively), Dindel (90.1 and 88.6%, respectively), and Freebayes (80.1 and 94.4%, respectively). GATK had the highest sensitivity. Furthermore, we identified INDELs with high PPV (4 algorithms intersection: 98.7%, 3 algorithms intersection: 97.6%, and GATK and SAMtools intersection INDELs: 97.6%). We presented two key sources of difficulties in accurate INDEL detection: 1) the presence of repeat, and 2) heterozygous INDELs. Herein we could suggest the accessible algorithms that selectively reduce error rates and thereby facilitate INDEL detection. Our study may also serve as a basis for understanding the accuracy and completeness of INDEL detection.

Citing Articles

Towards accurate indel calling for oncopanel sequencing through an international pipeline competition at precisionFDA.

Gong B, Lababidi S, Kusko R, Bouri K, Prezek S, Thovarai V Sci Rep. 2024; 14(1):8165.

PMID: 38589653 PMC: 11001604. DOI: 10.1038/s41598-024-58573-y.


Extend the benchmarking indel set by manual review using the individual cell line sequencing data from the Sequencing Quality Control 2 (SEQC2) project.

Gong B, Li D, Zhang Y, Kusko R, Lababidi S, Cao Z Sci Rep. 2024; 14(1):7028.

PMID: 38528062 PMC: 10963753. DOI: 10.1038/s41598-024-57439-7.


Identification of grapevine clones via high-throughput amplicon sequencing: a proof-of-concept study.

Urra C, Sanhueza D, Pavez C, Tapia P, Nunez-Lillo G, Minio A G3 (Bethesda). 2023; 13(9).

PMID: 37395733 PMC: 10468313. DOI: 10.1093/g3journal/jkad145.


The Novel Structural Variation in the GHR Gene Is Associated with Growth Traits in Yaks ().

Wang F, Wu X, Ma X, Bao Q, Zheng Q, Chu M Animals (Basel). 2023; 13(5).

PMID: 36899708 PMC: 10000137. DOI: 10.3390/ani13050851.


Multi-gene panel testing increases germline predisposing mutations' detection in a cohort of breast/ovarian cancer patients from Southern Italy.

Nunziato M, Di Maggio F, Pensabene M, Esposito M, Starnone F, De Angelis C Front Med (Lausanne). 2022; 9:894358.

PMID: 36035419 PMC: 9403188. DOI: 10.3389/fmed.2022.894358.


References
1.
Shigemizu D, Fujimoto A, Akiyama S, Abe T, Nakano K, Boroevich K . A practical method to detect SNVs and indels from whole genome and exome sequencing data. Sci Rep. 2013; 3:2161. PMC: 3703611. DOI: 10.1038/srep02161. View

2.
Ghoneim D, Myers J, Tuttle E, Paciorkowski A . Comparison of insertion/deletion calling algorithms on human next-generation sequencing data. BMC Res Notes. 2014; 7:864. PMC: 4265454. DOI: 10.1186/1756-0500-7-864. View

3.
Fang H, Wu Y, Narzisi G, ORawe J, Barron L, Rosenbaum J . Reducing INDEL calling errors in whole genome and exome sequencing data. Genome Med. 2014; 6(10):89. PMC: 4240813. DOI: 10.1186/s13073-014-0089-z. View

4.
Mardis E . Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008; 9:387-402. DOI: 10.1146/annurev.genom.9.081307.164359. View

5.
Li H . A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011; 27(21):2987-93. PMC: 3198575. DOI: 10.1093/bioinformatics/btr509. View