RSEM: Accurate Transcript Quantification from RNA-Seq Data with or Without a Reference Genome

Overview

Journal BMC Bioinformatics

Publisher Biomed Central

Specialty Biology

Date 2011 Aug 6

PMID 21816040

Citations 9915

Authors

Bo Li

Colin N Dewey

Affiliations

Soon will be listed here.

Abstract

Background: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments.

Results: We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene.

Conclusions: RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.

Citing Articles

A systematic benchmark of Nanopore long-read RNA sequencing for transcript-level analysis in human cell lines.

Chen Y, Davidson N, Kei Wan Y, Yao F, Su Y, Gamaarachchi H Nat Methods. 2025; .

PMID: 40082608 DOI: 10.1038/s41592-025-02623-4.

Profiling hippocampal neuronal populations reveals unique gene expression mosaics reflective of connectivity-based degeneration in the Ts65Dn mouse model of Down syndrome and Alzheimer's disease.

Alldred M, Ibrahim K, Pidikiti H, Lee S, Heguy A, Chiosis G Front Mol Neurosci. 2025; 18:1546375.

PMID: 40078964 PMC: 11897496. DOI: 10.3389/fnmol.2025.1546375.

Integrated analysis of single-cell and bulk transcriptomes uncovers clinically relevant molecular subtypes in human prostate cancer.

Ding T, He L, Lin G, Xu L, Zhu Y, Wang X Chin J Cancer Res. 2025; 37(1):90-114.

PMID: 40078560 PMC: 11893346. DOI: 10.21147/j.issn.1000-9604.2025.01.07.

Genomic and Transcriptomic Analysis of Mutant with Enhanced Nattokinase Production via ARTP Mutagenesis.

Guo L, Chen Y, He Z, Wang Z, Chen Q, Chen J Foods. 2025; 14(5).

PMID: 40077601 PMC: 11899143. DOI: 10.3390/foods14050898.

Transcriptome Analysis of During Seed Germination Reveals GA-Inducible Genes Associated with Phenylpropanoid and Hormone Pathways.

Luo Y, Wang K, Cheng J, Nan L Int J Mol Sci. 2025; 26(5).

PMID: 40076954 PMC: 11900539. DOI: 10.3390/ijms26052335.

References

Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman S . De novo assembly and analysis of RNA-seq data. Nat Methods. 2010; 7(11):909-12. DOI: 10.1038/nmeth.1517. View

Roberts A, Pimentel H, Trapnell C, Pachter L . Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics. 2011; 27(17):2325-9. DOI: 10.1093/bioinformatics/btr355. View

Pruitt K, Tatusova T, Klimke W, Maglott D . NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res. 2008; 37(Database issue):D32-6. PMC: 2686572. DOI: 10.1093/nar/gkn721. View

Wu Z, Wang X, Zhang X . Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq. Bioinformatics. 2010; 27(4):502-8. DOI: 10.1093/bioinformatics/btq696. View

Flicek P, Amode M, Barrell D, Beal K, Brent S, Chen Y . Ensembl 2011. Nucleic Acids Res. 2010; 39(Database issue):D800-6. PMC: 3013672. DOI: 10.1093/nar/gkq1064. View

Mortazavi A, Williams B, McCue K, Schaeffer L, Wold B . Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008; 5(7):621-8. DOI: 10.1038/nmeth.1226. View

Roberts A, Trapnell C, Donaghey J, Rinn J, Pachter L . Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol. 2011; 12(3):R22. PMC: 3129672. DOI: 10.1186/gb-2011-12-3-r22. View

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N . The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009; 25(16):2078-9. PMC: 2723002. DOI: 10.1093/bioinformatics/btp352. View

Feng J, Li W, Jiang T . Inference of isoforms from short sequence reads. J Comput Biol. 2011; 18(3):305-21. PMC: 3123862. DOI: 10.1089/cmb.2010.0243. View

10.

Richard H, Schulz M, Sultan M, Nurnberger A, Schrinner S, Balzereit D . Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments. Nucleic Acids Res. 2010; 38(10):e112. PMC: 2879520. DOI: 10.1093/nar/gkq041. View

11.

Morin R, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh T . Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. Biotechniques. 2008; 45(1):81-94. DOI: 10.2144/000112900. View

12.

Langmead B, Trapnell C, Pop M, Salzberg S . Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009; 10(3):R25. PMC: 2690996. DOI: 10.1186/gb-2009-10-3-r25. View

13.

Trapnell C, Williams B, Pertea G, Mortazavi A, Kwan G, van Baren M . Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010; 28(5):511-5. PMC: 3146043. DOI: 10.1038/nbt.1621. View

14.

Bullard J, Purdom E, Hansen K, Dudoit S . Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics. 2010; 11:94. PMC: 2838869. DOI: 10.1186/1471-2105-11-94. View

15.

Jiang H, Wong W . Statistical inferences for isoform expression in RNA-Seq. Bioinformatics. 2009; 25(8):1026-32. PMC: 2666817. DOI: 10.1093/bioinformatics/btp113. View

16.

Li B, Ruotti V, Stewart R, Thomson J, Dewey C . RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics. 2009; 26(4):493-500. PMC: 2820677. DOI: 10.1093/bioinformatics/btp692. View

17.

Fujita P, Rhead B, Zweig A, Hinrichs A, Karolchik D, Cline M . The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 2010; 39(Database issue):D876-82. PMC: 3242726. DOI: 10.1093/nar/gkq963. View

18.

Pasaniuc B, Zaitlen N, Halperin E . Accurate estimation of expression levels of homologous genes in RNA-seq experiments. J Comput Biol. 2011; 18(3):459-68. DOI: 10.1089/cmb.2010.0259. View

19.

Grabherr M, Haas B, Yassour M, Levin J, Thompson D, Amit I . Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011; 29(7):644-52. PMC: 3571712. DOI: 10.1038/nbt.1883. View

20.

Katz Y, Wang E, Airoldi E, Burge C . Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods. 2010; 7(12):1009-15. PMC: 3037023. DOI: 10.1038/nmeth.1528. View