» Articles » PMID: 20693479

Optimization of De Novo Transcriptome Assembly from Next-generation Sequencing Data

Overview
Journal Genome Res
Specialty Genetics
Date 2010 Aug 10
PMID 20693479
Citations 199
Authors
Affiliations
Soon will be listed here.
Abstract

Transcriptome analysis has important applications in many biological fields. However, assembling a transcriptome without a known reference remains a challenging task requiring algorithmic improvements. We present two methods for substantially improving transcriptome de novo assembly. The first method relies on the observation that the use of a single k-mer length by current de novo assemblers is suboptimal to assemble transcriptomes where the sequence coverage of transcripts is highly heterogeneous. We present the Multiple-k method in which various k-mer lengths are used for de novo transcriptome assembly. We demonstrate its good performance by assembling de novo a published next-generation transcriptome sequence data set of Aedes aegypti, using the existing genome to check the accuracy of our method. The second method relies on the use of a reference proteome to improve the de novo assembly. We developed the Scaffolding using Translation Mapping (STM) method that uses mapping against the closest available reference proteome for scaffolding contigs that map onto the same protein. In a controlled experiment using simulated data, we show that the STM method considerably improves the assembly, with few errors. We applied these two methods to assemble the transcriptome of the non-model catfish Loricaria gr. cataphracta. Using the Multiple-k and STM methods, the assembly increases in contiguity and in gene identification, showing that our methods clearly improve quality and can be widely used. The new methods were used to assemble successfully the transcripts of the core set of genes regulating tooth development in vertebrates, while classic de novo assembly failed.

Citing Articles

Roast: a tool for reference-free optimization of supertranscriptome assemblies.

Shabbir M, Mithani A BMC Bioinformatics. 2024; 25(1):2.

PMID: 38166712 PMC: 10763045. DOI: 10.1186/s12859-023-05614-4.


Metabolomics and transcriptomics analyses for characterizing the alkaloid metabolism of Chinese jujube and sour jujube fruits.

Xue X, Zhao A, Wang Y, Ren H, Su W, Li Y Front Plant Sci. 2023; 14:1267758.

PMID: 37790781 PMC: 10544937. DOI: 10.3389/fpls.2023.1267758.


Optimizing an efficient ensemble approach for high-quality de novo transcriptome assembly of Thymus daenensis.

Ahmadi H, Sheikh-Assadi M, Fatahi R, Zamani Z, Shokrpour M Sci Rep. 2023; 13(1):12415.

PMID: 37524806 PMC: 10390528. DOI: 10.1038/s41598-023-39620-6.


Elucidating the Mesocarp Drupe Transcriptome of Açai ( Mart.): An Amazonian Tree Palm Producer of Bioactive Compounds.

Darnet E, Teixeira B, Schaller H, Rogez H, Darnet S Int J Mol Sci. 2023; 24(11).

PMID: 37298279 PMC: 10253617. DOI: 10.3390/ijms24119315.


Normalized Workflow to Optimize Hybrid De Novo Transcriptome Assembly for Non-Model Species: A Case Study in (Baker) Boiss.

Sheikh-Assadi M, Naderi R, Salami S, Kafi M, Fatahi R, Shariati V Plants (Basel). 2022; 11(18).

PMID: 36145766 PMC: 9503428. DOI: 10.3390/plants11182365.


References
1.
Nene V, Wortman J, Lawson D, Haas B, Kodira C, Tu Z . Genome sequence of Aedes aegypti, a major arbovirus vector. Science. 2007; 316(5832):1718-23. PMC: 2868357. DOI: 10.1126/science.1138878. View

2.
Torres T, Metta M, Ottenwalder B, Schlotterer C . Gene expression profiling by massively parallel sequencing. Genome Res. 2007; 18(1):172-7. PMC: 2134766. DOI: 10.1101/gr.6984908. View

3.
Elling A, Deng X . Next-generation sequencing reveals complex relationships between the epigenome and transcriptome in maize. Plant Signal Behav. 2009; 4(8):760-2. PMC: 2801393. DOI: 10.1105/tpc.109.065714. View

4.
MacCallum I, Przybylski D, Gnerre S, Burton J, Shlyakhter I, Gnirke A . ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol. 2009; 10(10):R103. PMC: 2784318. DOI: 10.1186/gb-2009-10-10-r103. View

5.
Disset A, Cheval L, Soutourina O, Duong van Huyen J, Li G, Genin C . Tissue compartment analysis for biomarker discovery by gene expression profiling. PLoS One. 2009; 4(11):e7779. PMC: 2771357. DOI: 10.1371/journal.pone.0007779. View