» Articles » PMID: 19546169

Sequence and Structural Variation in a Human Genome Uncovered by Short-read, Massively Parallel Ligation Sequencing Using Two-base Encoding

Abstract

We describe the genome sequencing of an anonymous individual of African origin using a novel ligation-based sequencing assay that enables a unique form of error correction that improves the raw accuracy of the aligned reads to >99.9%, allowing us to accurately call SNPs with as few as two reads per allele. We collected several billion mate-paired reads yielding approximately 18x haploid coverage of aligned sequence and close to 300x clone coverage. Over 98% of the reference genome is covered with at least one uniquely placed read, and 99.65% is spanned by at least one uniquely placed mate-paired clone. We identify over 3.8 million SNPs, 19% of which are novel. Mate-paired data are used to physically resolve haplotype phases of nearly two-thirds of the genotypes obtained and produce phased segments of up to 215 kb. We detect 226,529 intra-read indels, 5590 indels between mate-paired reads, 91 inversions, and four gene fusions. We use a novel approach for detecting indels between mate-paired reads that are smaller than the standard deviation of the insert size of the library and discover deletions in common with those detected with our intra-read approach. Dozens of mutations previously described in OMIM and hundreds of nonsynonymous single-nucleotide and structural variants in genes previously implicated in disease are identified in this individual. There is more genetic variation in the human genome still to be uncovered, and we provide guidance for future surveys in populations and cancer biopsies.

Citing Articles

Integrated omics profiling of individual variations in intestinal damage to the soybean allergen in piglets.

Mi M, Zheng Y, Fu X, Bao N, Pan L, Qin G Front Vet Sci. 2025; 11:1521544.

PMID: 39881721 PMC: 11774947. DOI: 10.3389/fvets.2024.1521544.


High-throughput sequencing: a breakthrough in molecular diagnosis for precision medicine.

Dongare D, Nishad S, Mastoli S, Saraf S, Srivastava N, Dey A Funct Integr Genomics. 2025; 25(1):22.

PMID: 39838192 DOI: 10.1007/s10142-025-01529-w.


Sequencing and Optical Genome Mapping for the Adventurous Chemist.

Ruppeka Rupeika E, DHuys L, Leen V, Hofkens J Chem Biomed Imaging. 2024; 2(12):784-807.

PMID: 39735829 PMC: 11673194. DOI: 10.1021/cbmi.4c00060.


Advancing pathogen and tumor copy number variation detection through simultaneous metagenomic next-generation sequencing: A comprehensive review.

Xie X, Xi X, Zhao D, Zhao Y, Yi T, Chen D Heliyon. 2024; 10(21):e38826.

PMID: 39568836 PMC: 11577201. DOI: 10.1016/j.heliyon.2024.e38826.


A long journey to treat epilepsy with the gut microbiota.

Li Q, Gu Y, Liang J, Yang Z, Qin J Front Cell Neurosci. 2024; 18:1386205.

PMID: 38988662 PMC: 11233807. DOI: 10.3389/fncel.2024.1386205.


References
1.
Tobler A, Short S, Andersen M, Paner T, Briggs J, Lambert S . The SNPlex genotyping system: a flexible and scalable platform for SNP genotyping. J Biomol Tech. 2006; 16(4):398-406. PMC: 2291745. View

2.
Kidd J, Newman T, Tuzun E, Kaul R, Eichler E . Population stratification of a common APOBEC gene deletion polymorphism. PLoS Genet. 2007; 3(4):e63. PMC: 1853121. DOI: 10.1371/journal.pgen.0030063. View

3.
Dressman D, Yan H, Traverso G, Kinzler K, Vogelstein B . Transforming single DNA molecules into fluorescent magnetic particles for detection and enumeration of genetic variations. Proc Natl Acad Sci U S A. 2003; 100(15):8817-22. PMC: 166396. DOI: 10.1073/pnas.1133470100. View

4.
Valouev A, Ichikawa J, Tonthat T, Stuart J, Ranade S, Peckham H . A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome Res. 2008; 18(7):1051-63. PMC: 2493394. DOI: 10.1101/gr.076463.108. View

5.
Marth G, Korf I, Yandell M, Yeh R, Gu Z, Zakeri H . A general approach to single-nucleotide polymorphism discovery. Nat Genet. 1999; 23(4):452-6. DOI: 10.1038/70570. View