» Articles » PMID: 24653210

Sequencing and Assembly of the 22-gb Loblolly Pine Genome

Abstract

Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer "super-reads," rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.

Citing Articles

Differential microRNA and Target Gene Expression in Scots Pine ( L.) Needles in Response to Methyl Jasmonate Treatment.

Krivmane B, Rungis D Genes (Basel). 2025; 16(1).

PMID: 39858573 PMC: 11765084. DOI: 10.3390/genes16010026.


Unraveling site-specific seed formation abnormalities in Mast. trees via widely metabolomic and transcriptomic analysis.

Li K, Lin J, Fan R, Chen S, Ma Z, Ji W Front Plant Sci. 2024; 15:1495784.

PMID: 39719938 PMC: 11667104. DOI: 10.3389/fpls.2024.1495784.


Lignin biosynthesis pathway repressors in gymnosperms: differential repressor domains as compared to angiosperms.

Ranade S, Garcia-Gil M For Res (Fayettev). 2024; 4:e031.

PMID: 39524426 PMC: 11524278. DOI: 10.48130/forres-0024-0029.


Unveiling Key Genes and Unique Transcription Factors Involved in Secondary Cell Wall Formation in .

Ding W, Tu Z, Gong B, Deng Z, Liu Q, Gu Z Int J Mol Sci. 2024; 25(21).

PMID: 39519356 PMC: 11545933. DOI: 10.3390/ijms252111805.


Conifers Concentrate Large Numbers of NLR Immune Receptor Genes on One Chromosome.

Woudstra Y, Tumas H, van Ghelder C, Hung T, Ilska J, Girardi S Genome Biol Evol. 2024; 16(6).

PMID: 38787537 PMC: 11171428. DOI: 10.1093/gbe/evae113.


References
1.
Nystedt B, Street N, Wetterbom A, Zuccolo A, Lin Y, Scofield D . The Norway spruce genome sequence and conifer genome evolution. Nature. 2013; 497(7451):579-84. DOI: 10.1038/nature12211. View

2.
Tatusov R, Fedorova N, Jackson J, Jacobs A, Kiryutin B, Koonin E . The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003; 4:41. PMC: 222959. DOI: 10.1186/1471-2105-4-41. View

3.
Wegrzyn J, Lin B, Zieve J, Dougherty W, Martinez-Garcia P, Koriabine M . Insights into the loblolly pine genome: characterization of BAC and fosmid sequences. PLoS One. 2013; 8(9):e72439. PMC: 3762812. DOI: 10.1371/journal.pone.0072439. View

4.
Lander E, Waterman M . Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics. 1988; 2(3):231-9. DOI: 10.1016/0888-7543(88)90007-9. View

5.
Neale D, Kremer A . Forest tree genomics: growing resources and applications. Nat Rev Genet. 2011; 12(2):111-22. DOI: 10.1038/nrg2931. View