» Articles » PMID: 39554123

Data-driven AI System for Learning How to Run Transcript Assemblers

Overview
Journal bioRxiv
Date 2024 Nov 18
PMID 39554123
Authors
Affiliations
Soon will be listed here.
Abstract

We introduce AutoTuneX, a data-driven, AI system designed to automatically predict optimal parameters for transcript assemblers - tools for reconstructing expressed transcripts from the reads in a given RNA-seq sample. AutoTuneX is built by learning parameter knowledge from existing RNA-seq samples and transferring this knowledge to unseen samples. On 1588 human RNA-seq samples tested with two transcript assemblers, AutoTuneX predicts parameters that resulted in 98% of samples achieving more accurate transcript assembly compared to using default parameter settings, with some samples experiencing up to a 600% improvement in AUC. AutoTuneX offers a new strategy for automatically optimizing use of sequence analysis tools.

References
1.
Nip K, Chiu R, Yang C, Chu J, Mohamadi H, Warren R . RNA-Bloom enables reference-free and reference-guided sequence assembly for single-cell transcriptomes. Genome Res. 2020; 30(8):1191-1200. PMC: 7462077. DOI: 10.1101/gr.260174.119. View

2.
Tung L, Shao M, Kingsford C . Quantifying the benefit offered by transcript assembly with Scallop-LR on single-molecule long reads. Genome Biol. 2019; 20(1):287. PMC: 6918626. DOI: 10.1186/s13059-019-1883-0. View

3.
Majoros W, Salzberg S . An empirical analysis of training protocols for probabilistic gene finders. BMC Bioinformatics. 2004; 5:206. PMC: 544851. DOI: 10.1186/1471-2105-5-206. View

4.
Pertea M, Pertea G, Antonescu C, Chang T, Mendell J, Salzberg S . StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015; 33(3):290-5. PMC: 4643835. DOI: 10.1038/nbt.3122. View

5.
. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012; 489(7414):57-74. PMC: 3439153. DOI: 10.1038/nature11247. View