» Articles » PMID: 19447966

Combinatorial Algorithms for Structural Variation Detection in High-throughput Sequenced Genomes

Overview
Journal Genome Res
Specialty Genetics
Date 2009 May 19
PMID 19447966
Citations 150
Authors
Affiliations
Soon will be listed here.
Abstract

Recent studies show that along with single nucleotide polymorphisms and small indels, larger structural variants among human individuals are common. The Human Genome Structural Variation Project aims to identify and classify deletions, insertions, and inversions (>5 Kbp) in a small number of normal individuals with a fosmid-based paired-end sequencing approach using traditional sequencing technologies. The realization of new ultra-high-throughput sequencing platforms now makes it feasible to detect the full spectrum of genomic variation among many individual genomes, including cancer patients and others suffering from diseases of genomic origin. Unfortunately, existing algorithms for identifying structural variation (SV) among individuals have not been designed to handle the short read lengths and the errors implied by the "next-gen" sequencing (NGS) technologies. In this paper, we give combinatorial formulations for the SV detection between a reference genome sequence and a next-gen-based, paired-end, whole genome shotgun-sequenced individual. We describe efficient algorithms for each of the formulations we give, which all turn out to be fast and quite reliable; they are also applicable to all next-gen sequencing methods (Illumina, 454 Life Sciences [Roche], ABI SOLiD, etc.) and traditional capillary sequencing technology. We apply our algorithms to identify SV among individual genomes very recently sequenced by Illumina technology.

Citing Articles

ECOLE: Learning to call copy number variants on whole exome sequencing data.

Mandiracioglu B, Ozden F, Kaynar G, Yilmaz M, Alkan C, Cicek A Nat Commun. 2024; 15(1):132.

PMID: 38167256 PMC: 10762021. DOI: 10.1038/s41467-023-44116-y.


Potentials and challenges of chromosomal microarray analysis in prenatal diagnosis.

Liu X, Liu S, Wang H, Hu T Front Genet. 2022; 13:938183.

PMID: 35957681 PMC: 9360565. DOI: 10.3389/fgene.2022.938183.


Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data.

Lavrichenko K, Johansson S, Jonassen I BMC Genomics. 2021; 22(1):826.

PMID: 34789167 PMC: 8596897. DOI: 10.1186/s12864-021-08082-3.


A Cluster-Based Approach for the Discovery of Copy Number Variations From Next-Generation Sequencing Data.

Liu G, Zhang J Front Genet. 2021; 12:699510.

PMID: 34262604 PMC: 8273656. DOI: 10.3389/fgene.2021.699510.


Detecting inversions with PCA in the presence of population structure.

Nowling R, Manke K, Emrich S PLoS One. 2020; 15(10):e0240429.

PMID: 33119626 PMC: 7595445. DOI: 10.1371/journal.pone.0240429.


References
1.
Hillier L, Marth G, Quinlan A, Dooling D, Fewell G, Barnett D . Whole-genome sequencing and variant discovery in C. elegans. Nat Methods. 2008; 5(2):183-8. DOI: 10.1038/nmeth.1179. View

2.
Campbell P, Stephens P, Pleasance E, OMeara S, Li H, Santarius T . Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet. 2008; 40(6):722-9. PMC: 2705838. DOI: 10.1038/ng.128. View

3.
Dew I, Walenz B, Sutton G . A tool for analyzing mate pairs in assemblies (TAMPA). J Comput Biol. 2005; 12(5):497-513. DOI: 10.1089/cmb.2005.12.497. View

4.
Feuk L, Carson A, Scherer S . Structural variation in the human genome. Nat Rev Genet. 2006; 7(2):85-97. DOI: 10.1038/nrg1767. View

5.
Wang J, Wang W, Li R, Li Y, Tian G, Goodman L . The diploid genome sequence of an Asian individual. Nature. 2008; 456(7218):60-5. PMC: 2716080. DOI: 10.1038/nature07484. View