» Articles » PMID: 36705893

SORFPred: A Method Based on Comprehensive Features and Ensemble Learning to Predict the SORFs in Plant LncRNAs

Overview
Journal Interdiscip Sci
Specialty Biology
Date 2023 Jan 27
PMID 36705893
Authors
Affiliations
Soon will be listed here.
Abstract

Long non-coding RNAs (lncRNAs) are important regulators of biological processes. It has recently been shown that some lncRNAs include small open reading frames (sORFs) that can encode small peptides of no more than 100 amino acids. However, existing methods are commonly applied to human and animal datasets and still suffer from low feature representation capability. Thus, accurate and credible prediction of sORFs with coding ability in plant lncRNAs is imperative. This paper proposes a new method termed sORFPred, in which we design a model named MCSEN by combining multi-scale convolution and Squeeze-and-Excitation Networks to fully mine distinct information embedded in sORFs, integrate and optimize multiple sequence-based and physicochemical feature descriptors, and built a two-layer prediction classifier based on Bayesian optimization algorithm and Extra Trees. sORFPred has been evaluated on sORFs datasets of three species and experimentally validated sORFs dataset. Results indicate that sORFPred outperforms existing methods and achieves 97.28% accuracy, 97.06% precision, 97.52% recall, and 97.29% F1-score on Arabidopsis thaliana, which shows a significant improvement in prediction performance compared to various conventional shallow machine learning and deep learning models.

Citing Articles

misORFPred: A Novel Method to Mine Translatable sORFs in Plant Pri-miRNAs Using Enhanced Scalable k-mer and Dynamic Ensemble Voting Strategy.

Li H, Meng J, Wang Z, Luan Y Interdiscip Sci. 2024; 17(1):114-133.

PMID: 39397199 DOI: 10.1007/s12539-024-00661-8.


LncRNA-encoded peptides in cancer.

Zhang Y J Hematol Oncol. 2024; 17(1):66.

PMID: 39135098 PMC: 11320871. DOI: 10.1186/s13045-024-01591-0.


sOCP: a framework predicting smORF coding potential based on TIS and in-frame features and effectively applied in the human genome.

Peng Z, Li J, Jiang X, Wan C Brief Bioinform. 2024; 25(3).

PMID: 38600664 PMC: 11006793. DOI: 10.1093/bib/bbae147.


Long Non-Coding RNAs of Plants in Response to Abiotic Stresses and Their Regulating Roles in Promoting Environmental Adaption.

Yang H, Cui Y, Feng Y, Hu Y, Liu L, Duan L Cells. 2023; 12(5).

PMID: 36899864 PMC: 10001313. DOI: 10.3390/cells12050729.

References
1.
Hon C, Ramilowski J, Harshbarger J, Bertin N, Rackham O, Gough J . An atlas of human long non-coding RNAs with accurate 5' ends. Nature. 2017; 543(7644):199-204. PMC: 6857182. DOI: 10.1038/nature21374. View

2.
Nelson B, Makarewich C, Anderson D, Winders B, Troupes C, Wu F . A peptide encoded by a transcript annotated as long noncoding RNA enhances SERCA activity in muscle. Science. 2016; 351(6270):271-5. PMC: 4892890. DOI: 10.1126/science.aad4076. View

3.
Cui J, Luan Y, Jiang N, Bao H, Meng J . Comparative transcriptome analysis between resistant and susceptible tomato allows the identification of lncRNA16397 conferring resistance to Phytophthora infestans by co-expressing glutaredoxin. Plant J. 2016; 89(3):577-589. DOI: 10.1111/tpj.13408. View

4.
Cui J, Jiang N, Meng J, Yang G, Liu W, Zhou X . LncRNA33732-respiratory burst oxidase module associated with WRKY1 in tomato- Phytophthora infestans interactions. Plant J. 2018; 97(5):933-946. DOI: 10.1111/tpj.14173. View

5.
Hong Y, Zhang Y, Cui J, Meng J, Chen Y, Zhang C . The lncRNA39896-miR166b-HDZs module affects tomato resistance to Phytophthora infestans. J Integr Plant Biol. 2022; 64(10):1979-1993. DOI: 10.1111/jipb.13339. View