» Articles » PMID: 21787409

Targeted Enrichment Beyond the Consensus Coding DNA Sequence Exome Reveals Exons with Higher Variant Densities

Overview
Journal Genome Biol
Specialties Biology
Genetics
Date 2011 Jul 27
PMID 21787409
Citations 151
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Enrichment of loci by DNA hybridization-capture, followed by high-throughput sequencing, is an important tool in modern genetics. Currently, the most common targets for enrichment are the protein coding exons represented by the consensus coding DNA sequence (CCDS). The CCDS, however, excludes many actual or computationally predicted coding exons present in other databases, such as RefSeq and Vega, and non-coding functional elements such as untranslated and regulatory regions. The number of variants per base pair (variant density) and our ability to interrogate regions outside of the CCDS regions is consequently less well understood.

Results: We examine capture sequence data from outside of the CCDS regions and find that extremes of GC content that are present in different subregions of the genome can reduce the local capture sequence coverage to less than 50% relative to the CCDS. This effect is due to biases inherent in both the Illumina and SOLiD sequencing platforms that are exacerbated by the capture process. Interestingly, for two subregion types, microRNA and predicted exons, the capture process yields higher than expected coverage when compared to whole genome sequencing. Lastly, we examine the variation present in non-CCDS regions and find that predicted exons, as well as exonic regions specific to RefSeq and Vega, show much higher variant densities than the CCDS.

Conclusions: We show that regions outside of the CCDS perform less efficiently in capture sequence experiments. Further, we show that the variant density in computationally predicted exons is more than 2.5-times higher than that observed in the CCDS.

Citing Articles

The mutational landscape and functional effects of noncoding ultraconserved elements in human cancers.

Bayraktar R, Tang Y, Dragomir M, Ivan C, Peng X, Fabris L Sci Adv. 2025; 11(8):eado2830.

PMID: 39970212 PMC: 11837999. DOI: 10.1126/sciadv.ado2830.


Overview and Prospects of DNA Sequence Visualization.

Wu Y, Xie X, Zhu J, Guan L, Li M Int J Mol Sci. 2025; 26(2).

PMID: 39859192 PMC: 11764684. DOI: 10.3390/ijms26020477.


Pre-processing of paleogenomes: mitigating reference bias and postmortem damage in ancient genome data.

Koptekin D, Yapar E, Vural K, Saglican E, Altinisik N, Malaspinas A Genome Biol. 2025; 26(1):6.

PMID: 39789608 PMC: 11721506. DOI: 10.1186/s13059-024-03462-w.


Optimization of Whole-Genome Resequencing Depth for High-Throughput SNP Genotyping in .

Lin P, Yu Y, Bao Z, Li F Int J Mol Sci. 2024; 25(22).

PMID: 39596153 PMC: 11593832. DOI: 10.3390/ijms252212083.


HMZDupFinder: a robust computational approach for detecting intragenic homozygous duplications from exome sequencing data.

Du H, Dardas Z, Jolly A, Grochowski C, Jhangiani S, Li H Nucleic Acids Res. 2023; 52(4):e18.

PMID: 38153174 PMC: 10899794. DOI: 10.1093/nar/gkad1223.


References
1.
Bilguvar K, Ozturk A, Louvi A, Kwan K, Choi M, Tatli B . Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations. Nature. 2010; 467(7312):207-10. PMC: 3129007. DOI: 10.1038/nature09327. View

2.
Otto E, Hurd T, Airik R, Chaki M, Zhou W, Stoetzel C . Candidate exome capture identifies mutation of SDCCAG8 as the cause of a retinal-renal ciliopathy. Nat Genet. 2010; 42(10):840-50. PMC: 2947620. DOI: 10.1038/ng.662. View

3.
Ur Rehman A, Morell R, Belyantseva I, Khan S, Boger E, Shahzad M . Targeted capture and next-generation sequencing identifies C9orf75, encoding taperin, as the mutated gene in nonsyndromic deafness DFNB79. Am J Hum Genet. 2010; 86(3):378-88. PMC: 2833391. DOI: 10.1016/j.ajhg.2010.01.030. View

4.
Kuhn R, Karolchik D, Zweig A, Trumbower H, Thomas D, Thakkapallayil A . The UCSC genome browser database: update 2007. Nucleic Acids Res. 2006; 35(Database issue):D668-73. PMC: 1669757. DOI: 10.1093/nar/gkl928. View

5.
Bainbridge M, Wang M, Burgess D, Kovar C, Rodesch M, DAscenzo M . Whole exome capture in solution with 3 Gbp of data. Genome Biol. 2010; 11(6):R62. PMC: 2911110. DOI: 10.1186/gb-2010-11-6-r62. View