AIDE: Annotation-assisted Isoform Discovery with High Precision

Overview

Journal Genome Res

Specialty Genetics

Date 2019 Nov 8

PMID 31694868

Citations 6

Authors

Wei Vivian Li

Shan Li

Xin Tong

Ling Deng

Hubing Shi

Jingyi Jessica Li

Affiliations

Soon will be listed here.

Abstract

Genome-wide accurate identification and quantification of full-length mRNA isoforms is crucial for investigating transcriptional and posttranscriptional regulatory mechanisms of biological phenomena. Despite continuing efforts in developing effective computational tools to identify or assemble full-length mRNA isoforms from second-generation RNA-seq data, it remains a challenge to accurately identify mRNA isoforms from short sequence reads owing to the substantial information loss in RNA-seq experiments. Here, we introduce a novel statistical method, annotation-assisted isoform discovery (AIDE), the first approach that directly controls false isoform discoveries by implementing the testing-based model selection principle. Solving the isoform discovery problem in a stepwise and conservative manner, AIDE prioritizes the annotated isoforms and precisely identifies novel isoforms whose addition significantly improves the explanation of observed RNA-seq reads. We evaluate the performance of AIDE based on multiple simulated and real RNA-seq data sets followed by PCR-Sanger sequencing validation. Our results show that AIDE effectively leverages the annotation information to compensate the information loss owing to short read lengths. AIDE achieves the highest precision in isoform discovery and the lowest error rates in isoform abundance estimation, compared with three state-of-the-art methods Cufflinks, SLIDE, and StringTie. As a robust bioinformatics tool for transcriptome analysis, AIDE enables researchers to discover novel transcripts with high confidence.

Citing Articles

TDP-43 loss induces extensive cryptic polyadenylation in ALS/FTD.

Bryce-Smith S, Brown A, Mehta P, Mattedi F, Mikheenko A, Barattucci S bioRxiv. 2024; .

PMID: 38313254 PMC: 10836071. DOI: 10.1101/2024.01.22.576625.

Transcriptomics for Clinical and Experimental Biology Research: Hang on a Seq.

Stokes T, Cen H, Kapranov P, Gallagher I, Pitsillides A, Volmar C Adv Genet (Hoboken). 2023; 4(2):2200024.

PMID: 37288167 PMC: 10242409. DOI: 10.1002/ggn2.202200024.

Partitioning RNAs by length improves transcriptome reconstruction from short-read RNA-seq data.

Ringeling F, Chakraborty S, Vissers C, Reiman D, Patel A, Lee K Nat Biotechnol. 2022; 40(5):741-750.

PMID: 35013600 PMC: 11332977. DOI: 10.1038/s41587-021-01136-7.

MAAPER: model-based analysis of alternative polyadenylation using 3' end-linked reads.

Li W, Zheng D, Wang R, Tian B Genome Biol. 2021; 22(1):222.

PMID: 34376236 PMC: 8356463. DOI: 10.1186/s13059-021-02429-5.

Maternal cecal microbiota transfer rescues early-life antibiotic-induced enhancement of type 1 diabetes in mice.

Zhang X, Yin Y, Wang J, Battaglia T, Krautkramer K, Li W Cell Host Microbe. 2021; 29(8):1249-1265.e9.

PMID: 34289377 PMC: 8370265. DOI: 10.1016/j.chom.2021.06.014.

References

Risso D, Schwartz K, Sherlock G, Dudoit S . GC-content normalization for RNA-Seq data. BMC Bioinformatics. 2011; 12:480. PMC: 3315510. DOI: 10.1186/1471-2105-12-480. View

Mordes D, Luo X, Kar A, Kuo D, Xu L, Fushimi K . Pre-mRNA splicing and retinitis pigmentosa. Mol Vis. 2006; 12:1259-71. PMC: 2683577. View

Limbourg A, von Felden J, Jagavelu K, Krishnasamy K, Napp L, Kapopara P . MAP-Kinase Activated Protein Kinase 2 Links Endothelial Activation and Monocyte/macrophage Recruitment in Arteriogenesis. PLoS One. 2015; 10(10):e0138542. PMC: 4592267. DOI: 10.1371/journal.pone.0138542. View

Finotello F, Lavezzo E, Bianco L, Barzon L, Mazzon P, Fontana P . Reducing bias in RNA sequencing data: a novel approach to compute counts. BMC Bioinformatics. 2014; 15 Suppl 1:S7. PMC: 4016203. DOI: 10.1186/1471-2105-15-S1-S7. View

Trapnell C, Williams B, Pertea G, Mortazavi A, Kwan G, van Baren M . Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010; 28(5):511-5. PMC: 3146043. DOI: 10.1038/nbt.1621. View

Eisfeld A, Schwind S, Hoag K, Walker C, Liyanarachchi S, Patel R . NRAS isoforms differentially affect downstream pathways, cell growth, and cell transformation. Proc Natl Acad Sci U S A. 2014; 111(11):4179-84. PMC: 3964043. DOI: 10.1073/pnas.1401727111. View

Mezlini A, Smith E, Fiume M, Buske O, Savich G, Shah S . iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data. Genome Res. 2012; 23(3):519-29. PMC: 3589540. DOI: 10.1101/gr.142232.112. View

Germain P, Vitriolo A, Adamo A, Laise P, Das V, Testa G . RNAontheBENCH: computational and empirical resources for benchmarking RNAseq quantification and differential expression methods. Nucleic Acids Res. 2016; 44(11):5054-67. PMC: 4914128. DOI: 10.1093/nar/gkw448. View

Dohm J, Lottaz C, Borodina T, Himmelbauer H . Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008; 36(16):e105. PMC: 2532726. DOI: 10.1093/nar/gkn425. View

10.

Steijger T, Abril J, Engstrom P, Kokocinski F, Hubbard T, Guigo R . Assessment of transcript reconstruction methods for RNA-seq. Nat Methods. 2013; 10(12):1177-84. PMC: 3851240. DOI: 10.1038/nmeth.2714. View

11.

Rosenbloom K, Armstrong J, Barber G, Casper J, Clawson H, Diekhans M . The UCSC Genome Browser database: 2015 update. Nucleic Acids Res. 2014; 43(Database issue):D670-81. PMC: 4383971. DOI: 10.1093/nar/gku1177. View

12.

Roberts A, Pimentel H, Trapnell C, Pachter L . Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics. 2011; 27(17):2325-9. DOI: 10.1093/bioinformatics/btr355. View

13.

Weirather J, de Cesare M, Wang Y, Piazza P, Sebastiano V, Wang X . Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Res. 2017; 6:100. PMC: 5553090. DOI: 10.12688/f1000research.10571.2. View

14.

Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley D . Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012; 7(3):562-78. PMC: 3334321. DOI: 10.1038/nprot.2012.016. View

15.

Fu S, Ma Y, Yao H, Xu Z, Chen S, Song J . IDP-denovo: de novo transcriptome assembly and isoform annotation by hybrid sequencing. Bioinformatics. 2018; 34(13):2168-2176. PMC: 6022631. DOI: 10.1093/bioinformatics/bty098. View

16.

Li J, Jiang H, Wong W . Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol. 2010; 11(5):R50. PMC: 2898062. DOI: 10.1186/gb-2010-11-5-r50. View

17.

Li W, Zhao A, Zhang S, Li J . MSIQ: JOINT MODELING OF MULTIPLE RNA-SEQ SAMPLES FOR ACCURATE ISOFORM QUANTIFICATION. Ann Appl Stat. 2018; 12(1):510-539. PMC: 5935499. DOI: 10.1214/17-AOAS1100. View

18.

Song C, Piva M, Sun L, Hong A, Moriceau G, Kong X . Recurrent Tumor Cell-Intrinsic and -Extrinsic Alterations during MAPKi-Induced Melanoma Regression and Early Adaptation. Cancer Discov. 2017; 7(11):1248-1265. PMC: 6668729. DOI: 10.1158/2159-8290.CD-17-0401. View

19.

Hooper J . A survey of software for genome-wide discovery of differential splicing in RNA-Seq data. Hum Genomics. 2014; 8:3. PMC: 3903050. DOI: 10.1186/1479-7364-8-3. View

20.

Prokopec S, Watson J, Waggott D, Smith A, Wu A, Okey A . Systematic evaluation of medium-throughput mRNA abundance platforms. RNA. 2012; 19(1):51-62. PMC: 3527726. DOI: 10.1261/rna.034710.112. View