» Articles » PMID: 28172640

Comprehensive Evaluation of De Novo Transcriptome Assembly Programs and Their Effects on Differential Gene Expression Analysis

Overview
Journal Bioinformatics
Specialty Biology
Date 2017 Feb 8
PMID 28172640
Citations 38
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: With the decreased cost of RNA-Seq, an increasing number of non-model organisms have been sequenced. Due to the lack of reference genomes, de novo transcriptome assembly is required. However, there is limited systematic research evaluating the quality of de novo transcriptome assemblies and how the assembly quality influences downstream analysis.

Results: We used two authentic RNA-Seq datasets from Arabidopsis thaliana, and produced transcriptome assemblies using eight programs with a series of k-mer sizes (from 25 to 71), including BinPacker, Bridger, IDBA-tran, Oases-Velvet, SOAPdenovo-Trans, SSP, Trans-ABySS and Trinity. We measured the assembly quality in terms of reference genome base and gene coverage, transcriptome assembly base coverage, number of chimeras and number of recovered full-length transcripts. SOAPdenovo-Trans performed best in base coverage, while Trans-ABySS performed best in gene coverage and number of recovered full-length transcripts. In terms of chimeric sequences, BinPacker and Oases-Velvet were the worst, while IDBA-tran, SOAPdenovo-Trans, Trans-ABySS and Trinity produced fewer chimeras across all single k-mer assemblies. In differential gene expression analysis, about 70% of the significantly differentially expressed genes (DEG) were the same using reference genome and de novo assemblies. We further identify four reasons for the differences in significant DEG between reference genome and de novo transcriptome assemblies: incomplete annotation, exon level differences, transcript fragmentation and incorrect gene annotation, which we suggest that de novo assembly is beneficial even when a reference genome is available.

Availability And Implementation: Software used in this study are publicly available at the authors' websites.

Contact: gribskov@purdue.edu

Supplimentary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Comprehensive Analysis of the Influence of Technical and Biological Variations on De Novo Assembly of RNA-Seq Datasets.

Sergio Alberto G, Maximo R, Andres R, Sergio L, Norma P Bioinform Biol Insights. 2024; 18:11779322241274957.

PMID: 39649541 PMC: 11622296. DOI: 10.1177/11779322241274957.


Sensitivity of transcriptomics: Different samples and methodology alter conclusions in Gulf pipefish (Syngnathus scovelli).

Johnson B, Rose E, Jones A J Hered. 2024; 116(2):139-148.

PMID: 39545939 PMC: 11879219. DOI: 10.1093/jhered/esae067.


Reexamining the Mycovirome of spp.

Munoz-Suarez H, Ruiz-Padilla A, Donaire L, Benito E, Ayllon M Viruses. 2024; 16(10).

PMID: 39459972 PMC: 11512270. DOI: 10.3390/v16101640.


First neurotranscriptome of adults Tambaquis (Colossoma macropomum) with characterization and differential expression between males and females.

Miranda J, Veneza I, Ferreira C, Santana P, Lutz I, Furtado C Sci Rep. 2024; 14(1):3130.

PMID: 38326509 PMC: 10850070. DOI: 10.1038/s41598-024-53734-5.


Eyeless cave-dwelling spiders still rely on light.

Wang K, Wang J, Liang B, Chang J, Zhu Y, Chen J Sci Adv. 2023; 9(51):eadj0348.

PMID: 38117895 PMC: 10732526. DOI: 10.1126/sciadv.adj0348.