» Articles » PMID: 34674629

A Consensus-based Ensemble Approach to Improve Transcriptome Assembly

Overview
Publisher Biomed Central
Specialty Biology
Date 2021 Oct 22
PMID 34674629
Citations 5
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Systems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality transcriptomes is still not a trivial problem. This is especially the case for non-model organisms where adequate reference genomes are often not available. Different methods produce different transcriptome models and there is no easy way to determine which are more accurate. Furthermore, having alternative-splicing events exacerbates such difficult assembly problems. While benchmarking transcriptome assemblies is critical, this is also not trivial due to the general lack of true reference transcriptomes.

Results: In this study, we first provide a pipeline to generate a set of the simulated benchmark transcriptome and corresponding RNAseq data. Using the simulated benchmarking datasets, we compared the performance of various transcriptome assembly approaches including both de novo and genome-guided methods. The results showed that the assembly performance deteriorates significantly when alternative transcripts (isoforms) exist or for genome-guided methods when the reference is not available from the same genome. To improve the transcriptome assembly performance, leveraging the overlapping predictions between different assemblies, we present a new consensus-based ensemble transcriptome assembly approach, ConSemble.

Conclusions: Without using a reference genome, ConSemble using four de novo assemblers achieved an accuracy up to twice as high as any de novo assemblers we compared. When a reference genome is available, ConSemble using four genome-guided assemblies removed many incorrectly assembled contigs with minimal impact on correctly assembled contigs, achieving higher precision and accuracy than individual genome-guided methods. Furthermore, ConSemble using de novo assemblers matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms. We thus demonstrated that the ConSemble consensus strategy both for de novo and genome-guided assemblers can improve transcriptome assembly. The RNAseq simulation pipeline, the benchmark transcriptome datasets, and the script to perform the ConSemble assembly are all freely available from: http://bioinfolab.unl.edu/emlab/consemble/ .

Citing Articles

Transcriptome analysis of two isolates of the tomato pathogen Cladosporium fulvum, uncovers genome-wide patterns of alternative splicing during a host infection cycle.

Zaccaron A, Chen L, Stergiopoulos I PLoS Pathog. 2024; 20(12):e1012791.

PMID: 39693392 PMC: 11694984. DOI: 10.1371/journal.ppat.1012791.


Comparative Genomics Uncovers the Evolutionary Dynamics of Detoxification and Insecticide Target Genes Across 11 Phlebotomine Sand Flies.

Charamis J, Balaska S, Ioannidis P, Dvorak V, Mavridis K, McDowell M Genome Biol Evol. 2024; 16(9).

PMID: 39224065 PMC: 11412322. DOI: 10.1093/gbe/evae186.


A cloud-based training module for efficient de novo transcriptome assembly using Nextflow and Google cloud.

Seaman R, Campbell R, Doe V, Yosufzai Z, Graber J Brief Bioinform. 2024; 25(4).

PMID: 38941113 PMC: 11212313. DOI: 10.1093/bib/bbae313.


De novo assembly of transcriptomes and differential gene expression analysis using short-read data from emerging model organisms - a brief guide.

Jackson D, Cerveau N, Posnien N Front Zool. 2024; 21(1):17.

PMID: 38902827 PMC: 11188175. DOI: 10.1186/s12983-024-00538-y.


Normalized Workflow to Optimize Hybrid De Novo Transcriptome Assembly for Non-Model Species: A Case Study in (Baker) Boiss.

Sheikh-Assadi M, Naderi R, Salami S, Kafi M, Fatahi R, Shariati V Plants (Basel). 2022; 11(18).

PMID: 36145766 PMC: 9503428. DOI: 10.3390/plants11182365.

References
1.
Marquez Y, Brown J, Simpson C, Barta A, Kalyna M . Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Res. 2012; 22(6):1184-95. PMC: 3371709. DOI: 10.1101/gr.134106.111. View

2.
Shao M, Kingsford C . Accurate assembly of transcripts through phase-preserving graph decomposition. Nat Biotechnol. 2017; 35(12):1167-1169. PMC: 5722698. DOI: 10.1038/nbt.4020. View

3.
Rana S, Zadlock 4th F, Zhang Z, Murphy W, Bentivegna C . Comparison of De Novo Transcriptome Assemblers and k-mer Strategies Using the Killifish, Fundulus heteroclitus. PLoS One. 2016; 11(4):e0153104. PMC: 4824410. DOI: 10.1371/journal.pone.0153104. View

4.
Simonis M, Atanur S, Linsen S, Guryev V, Ruzius F, Game L . Genetic basis of transcriptome differences between the founder strains of the rat HXB/BXH recombinant inbred panel. Genome Biol. 2012; 13(4):r31. PMC: 3446305. DOI: 10.1186/gb-2012-13-4-r31. View

5.
Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A . A survey of best practices for RNA-seq data analysis. Genome Biol. 2016; 17:13. PMC: 4728800. DOI: 10.1186/s13059-016-0881-8. View