» Articles » PMID: 21504602

Estimation of Alternative Splicing Isoform Frequencies from RNA-Seq Data

Overview
Publisher Biomed Central
Date 2011 Apr 21
PMID 21504602
Citations 82
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Massively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging.

Results: In this paper we present a novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at http://dna.engr.uconn.edu/software/IsoEM/.

Conclusions: Empirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes.

Citing Articles

Oarfish: Enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification.

Jousheghani Z, Patro R bioRxiv. 2024; .

PMID: 38464200 PMC: 10925290. DOI: 10.1101/2024.02.28.582591.


RNA-seq data science: From raw data to effective interpretation.

Deshpande D, Chhugani K, Chang Y, Karlsberg A, Loeffler C, Zhang J Front Genet. 2023; 14:997383.

PMID: 36999049 PMC: 10043755. DOI: 10.3389/fgene.2023.997383.


Comparison of Metagenomics and Metatranscriptomics Tools: A Guide to Making the Right Choice.

Terron-Camero L, Gordillo-Gonzalez F, Salas-Espejo E, Andres-Leon E Genes (Basel). 2022; 13(12).

PMID: 36553546 PMC: 9777648. DOI: 10.3390/genes13122280.


Lineage abundance estimation for SARS-CoV-2 in wastewater using transcriptome quantification techniques.

Baaijens J, Zulli A, Ott I, Nika I, van der Lugt M, Petrone M Genome Biol. 2022; 23(1):236.

PMID: 36348471 PMC: 9643916. DOI: 10.1186/s13059-022-02805-9.


T Cell Epitope Prediction and Its Application to Immunotherapy.

Schaap-Johansen A, Vujovic M, Borch A, Hadrup S, Marcatili P Front Immunol. 2021; 12:712488.

PMID: 34603286 PMC: 8479193. DOI: 10.3389/fimmu.2021.712488.


References
1.
Temple G, Gerhard D, Rasooly R, Feingold E, Good P, Robinson C . The completion of the Mammalian Gene Collection (MGC). Genome Res. 2009; 19(12):2324-33. PMC: 2792178. DOI: 10.1101/gr.095976.109. View

2.
Bloom J, Khan Z, Kruglyak L, Singh M, Caudy A . Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays. BMC Genomics. 2009; 10:221. PMC: 2686739. DOI: 10.1186/1471-2164-10-221. View

3.
She Y, Hubbell E, Wang H . Resolving deconvolution ambiguity in gene alternative splicing. BMC Bioinformatics. 2009; 10:237. PMC: 2739860. DOI: 10.1186/1471-2105-10-237. View

4.
Carninci P, Kasukawa T, Katayama S, Gough J, Frith M, Maeda N . The transcriptional landscape of the mammalian genome. Science. 2005; 309(5740):1559-63. DOI: 10.1126/science.1112014. View

5.
Howard B, Heber S . Towards reliable isoform quantification using RNA-SEQ data. BMC Bioinformatics. 2010; 11 Suppl 3:S6. PMC: 2863065. DOI: 10.1186/1471-2105-11-S3-S6. View