» Articles » PMID: 38808568

Readon: a Novel Algorithm to Identify Read-through Transcripts with Long-read Sequencing Data

Overview
Journal Bioinformatics
Specialty Biology
Date 2024 May 29
PMID 38808568
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: There are many clustered transcriptionally active regions in the human genome, in which the transcription complex cannot immediately terminate transcription at the upstream gene termination site, but instead continues to transcribe intergenic regions and downstream genes, resulting in read-through transcripts. Several studies have demonstrated the regulatory roles of read-through transcripts in tumorigenesis and development. However, limited by the read length of next-generation sequencing, discovery of read-through transcripts has been slow. For long but also erroneous third-generation sequencing data, this study developed a novel minimizer sketch algorithm to accurately and quickly identify read-through transcripts.

Results: Readon initially splits the reference sequence into distinct active regions. It employs a sliding window approach within each region, calculates minimizers, and constructs the specialized structured arrays for query indexing. Following initial alignment anchor screening of candidate read-through transcripts, further confirmation steps are executed. Comparative assessments against existing software reveal Readon's superior performance on both simulated and validated real data. Additionally, two downstream tools are provided: one for predicting whether a read-through transcript is likely to undergo nonsense-mediated decay or encodes a protein, and another for visualizing splicing patterns.

Availability And Implementation: Readon is freely available on GitHub (https://github.com/Bulabula45/Readon).

References
1.
Varley K, Gertz J, Roberts B, Davis N, Bowling K, Kirby M . Recurrent read-through fusion transcripts in breast cancer. Breast Cancer Res Treat. 2014; 146(2):287-97. PMC: 4085473. DOI: 10.1007/s10549-014-3019-2. View

2.
Liu Q, Hu Y, Stucky A, Fang L, Zhong J, Wang K . LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing. BMC Genomics. 2020; 21(Suppl 11):793. PMC: 7771079. DOI: 10.1186/s12864-020-07207-4. View

3.
Ni Y, Liu X, Simeneh Z, Yang M, Li R . Benchmarking of Nanopore R10.4 and R9.4.1 flow cells in single-cell whole-genome amplification and whole-genome shotgun sequencing. Comput Struct Biotechnol J. 2023; 21:2352-2364. PMC: 10070092. DOI: 10.1016/j.csbj.2023.03.038. View

4.
Rickman D, Pflueger D, Moss B, VanDoren V, Chen C, De La Taille A . SLC45A3-ELK4 is a novel and frequent erythroblast transformation-specific fusion transcript in prostate cancer. Cancer Res. 2009; 69(7):2734-8. PMC: 4063441. DOI: 10.1158/0008-5472.CAN-08-4926. View

5.
Dehghannasiri R, Freeman D, Jordanski M, Hsieh G, Damljanovic A, Lehnert E . Improved detection of gene fusions by applying statistical methods reveals oncogenic RNA cancer drivers. Proc Natl Acad Sci U S A. 2019; 116(31):15524-15533. PMC: 6681709. DOI: 10.1073/pnas.1900391116. View