» Articles » PMID: 15479945

An Intermediate Grade of Finished Genomic Sequence Suitable for Comparative Analyses

Abstract

Although the cost of generating draft-quality genomic sequence continues to decline, refining that sequence by the process of "sequence finishing" remains expensive. Near-perfect finished sequence is an appropriate goal for the human genome and a small set of reference genomes; however, such a high-quality product cannot be cost-justified for large numbers of additional genomes, at least for the foreseeable future. Here we describe the generation and quality of an intermediate grade of finished genomic sequence (termed comparative-grade finished sequence), which is tailored for use in multispecies sequence comparisons. Our analyses indicate that this sequence is very high quality (with the residual gaps and errors mostly falling within repetitive elements) and reflects 99% of the total sequence. Importantly, comparative-grade sequence finishing requires approximately 40-fold less reagents and approximately 10-fold less personnel effort compared to the generation of near-perfect finished sequence, such as that produced for the human genome. Although applied here to finishing sequence derived from individual bacterial artificial chromosome (BAC) clones, one could envision establishing routines for refining sequences emanating from whole-genome shotgun sequencing projects to a similar quality level. Our experience to date demonstrates that comparative-grade sequence finishing represents a practical and affordable option for sequence refinement en route to comparative analyses.

Citing Articles

Revised eutherian gene collections.

Premzl M BMC Genom Data. 2022; 23(1):56.

PMID: 35870891 PMC: 9308196. DOI: 10.1186/s12863-022-01071-9.


The landscape of nutri-informatics: a review of current resources and challenges for integrative nutrition research.

Chan L, Vasilevsky N, Thessen A, McMurry J, Haendel M Database (Oxford). 2021; 2021.

PMID: 33494105 PMC: 7833928. DOI: 10.1093/database/baab003.


Comparative genomic analysis of eutherian fibroblast growth factor genes.

Premzl M BMC Genomics. 2020; 21(1):542.

PMID: 32758140 PMC: 7430813. DOI: 10.1186/s12864-020-06958-4.


Comparative genomic analysis of eutherian connexin genes.

Premzl M Sci Rep. 2019; 9(1):16938.

PMID: 31729432 PMC: 6858305. DOI: 10.1038/s41598-019-53458-x.


Comparative genomic analysis of eutherian adiponectin genes.

Premzl M Heliyon. 2018; 4(6):e00647.

PMID: 30003153 PMC: 6040601. DOI: 10.1016/j.heliyon.2018.e00647.


References
1.
Shizuya H, Birren B, Kim U, Mancino V, Slepak T, Tachiiri Y . Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc Natl Acad Sci U S A. 1992; 89(18):8794-7. PMC: 50007. DOI: 10.1073/pnas.89.18.8794. View

2.
Mytelka D, CHAMBERLIN M . Analysis and suppression of DNA polymerase pauses associated with a trinucleotide consensus. Nucleic Acids Res. 1996; 24(14):2774-81. PMC: 146000. DOI: 10.1093/nar/24.14.2774. View

3.
Chissoe S, Marra M, Hillier L, Brinkman R, Wilson R, Waterston R . Representation of cloned genomic sequences in two sequencing vectors: correlation of DNA sequence and subclone distribution. Nucleic Acids Res. 1997; 25(15):2960-6. PMC: 146865. DOI: 10.1093/nar/25.15.2960. View

4.
McMurray A, Sulston J, Quail M . Short-insert libraries as a method of problem solving in genome sequencing. Genome Res. 1998; 8(5):562-6. PMC: 310723. DOI: 10.1101/gr.8.5.562. View

5.
Green E . Strategies for the systematic sequencing of complex genomes. Nat Rev Genet. 2001; 2(8):573-83. DOI: 10.1038/35084503. View