» Articles » PMID: 21102452

Limitations of Next-generation Genome Sequence Assembly

Overview
Journal Nat Methods
Date 2010 Nov 25
PMID 21102452
Citations 386
Authors
Affiliations
Soon will be listed here.
Abstract

High-throughput sequencing technologies promise to transform the fields of genetics and comparative biology by delivering tens of thousands of genomes in the near future. Although it is feasible to construct de novo genome assemblies in a few months, there has been relatively little attention to what is lost by sole application of short sequence reads. We compared the recent de novo assemblies using the short oligonucleotide analysis package (SOAP), generated from the genomes of a Han Chinese individual and a Yoruban individual, to experimentally validated genomic features. We found that de novo assemblies were 16.2% shorter than the reference genome and that 420.2 megabase pairs of common repeats and 99.1% of validated duplicated sequences were missing from the genome. Consequently, over 2,377 coding exons were completely missing. We conclude that high-quality sequencing approaches must be considered in conjunction with high-throughput sequencing for comparative genomics analyses and studies of genome evolution.

Citing Articles

ViLR: a novel virtual long read method for breakpoint identification and direct SNP haplotyping in de novo PGT-SR carriers without a proband.

Xue J, Xie M, Cai J, Kang K, Gu M, Li M Reprod Biol Endocrinol. 2025; 23(1):34.

PMID: 40038676 PMC: 11881346. DOI: 10.1186/s12958-025-01366-3.


Beyond Low Prevalence: Exploring Antibiotic Resistance and Virulence Profiles in Sri Lankan Helicobacter pylori with Comparative Genomics.

Fauzia K, Rathnayake J, Doohan D, Lamawansa M, Alfaray R, Batsaikhan S Microorganisms. 2025; 13(2).

PMID: 40005785 PMC: 11858055. DOI: 10.3390/microorganisms13020420.


Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection.

Negi S, Stenton S, Berger S, Canigiula P, McNulty B, Violich I Am J Hum Genet. 2025; 112(2):428-449.

PMID: 39862869 PMC: 11866955. DOI: 10.1016/j.ajhg.2025.01.002.


The peculiar characteristics and advancement in diagnostic methodologies of influenza A virus.

Asif Raza M, Ashraf M, Amjad M, Din G, Shen B, Hu Y Front Microbiol. 2025; 15():1435384.

PMID: 39839109 PMC: 11747045. DOI: 10.3389/fmicb.2024.1435384.


Comparative Analysis of the Chloroplast Genomes of the (Styracaceae) Species: Providing Insights into Molecular Evolution and Phylogenetic Relationships.

Dai W, Zheng H, Xu M, Zhu X, Long H, Xu X Int J Mol Sci. 2025; 26(1.

PMID: 39796037 PMC: 11720149. DOI: 10.3390/ijms26010177.


References
1.
Wheeler D, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A . The complete genome of an individual by massively parallel DNA sequencing. Nature. 2008; 452(7189):872-6. DOI: 10.1038/nature06884. View

2.
. Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. J Hered. 2009; 100(6):659-74. PMC: 2877544. DOI: 10.1093/jhered/esp086. View

3.
Schuster S, Miller W, Ratan A, Tomsho L, Giardine B, Kasson L . Complete Khoisan and Bantu genomes from southern Africa. Nature. 2010; 463(7283):943-7. PMC: 3890430. DOI: 10.1038/nature08795. View

4.
Li R, Li Y, Zheng H, Luo R, Zhu H, Li Q . Building the sequence map of the human pan-genome. Nat Biotechnol. 2009; 28(1):57-63. DOI: 10.1038/nbt.1596. View

5.
Li R, Fan W, Tian G, Zhu H, He L, Cai J . The sequence and de novo assembly of the giant panda genome. Nature. 2009; 463(7279):311-7. PMC: 3951497. DOI: 10.1038/nature08696. View