» Articles » PMID: 39062481

MRCat: A Novel CatBoost Predictor for the Binary Classification of MRNA Subcellular Localization by Fusing Large Language Model Representation and Sequence Features

Overview
Journal Biomolecules
Publisher MDPI
Date 2024 Jul 27
PMID 39062481
Authors
Affiliations
Soon will be listed here.
Abstract

The subcellular localization of messenger RNAs (mRNAs) is a pivotal aspect of biomolecules, tightly linked to gene regulation and protein synthesis, and offers innovative insights into disease diagnosis and drug development in the field of biomedicine. Several computational methods have been proposed to predict the subcellular localization of mRNAs within cells. However, there remains a deficiency in the accuracy of these predictions. In this study, we propose an mRCat predictor based on the gradient boosting tree algorithm specifically to predict whether mRNAs are localized in the nucleus or in the cytoplasm. This predictor firstly uses large language models to thoroughly explore hidden information within sequences and then integrates traditional sequence features to collectively characterize mRNA gene sequences. Finally, it employs CatBoost as the base classifier for predicting the subcellular localization of mRNAs. The experimental validation on an independent test set demonstrates that mRCat obtained accuracy of 0.761, F1 score of 0.710, MCC of 0.511, and AUROC of 0.751. The results indicate that our method has higher accuracy and robustness compared to other state-of-the-art methods. It is anticipated to offer deep insights for biomolecular research.

Citing Articles

Prediction of circRNA-Disease Associations via Graph Isomorphism Transformer and Dual-Stream Neural Predictor.

Li H, Qian Y, Sun Z, Zhu H Biomolecules. 2025; 15(2).

PMID: 40001537 PMC: 11853643. DOI: 10.3390/biom15020234.

References
1.
Martin K, Ephrussi A . mRNA localization: gene expression in the spatial dimension. Cell. 2009; 136(4):719-30. PMC: 2819924. DOI: 10.1016/j.cell.2009.01.044. View

2.
Yuan G, Wang Y, Wang G, Yang L . RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization. Brief Bioinform. 2022; 24(1). PMC: 9851306. DOI: 10.1093/bib/bbac509. View

3.
Zhivaki D, Gosselin E, Sengupta D, Concepcion H, Arinze C, Chow J . mRNAs encoding self-DNA reactive cGAS enhance the immunogenicity of lipid nanoparticle vaccines. mBio. 2023; 14(6):e0250623. PMC: 10746235. DOI: 10.1128/mbio.02506-23. View

4.
Liu H, Zhang W, Zou B, Wang J, Deng Y, Deng L . DrugCombDB: a comprehensive database of drug combinations toward the discovery of combinatorial therapy. Nucleic Acids Res. 2019; 48(D1):D871-D881. PMC: 7145671. DOI: 10.1093/nar/gkz1007. View

5.
Nair A, Sreenadhan S . A coding measure scheme employing electron-ion interaction pseudopotential (EIIP). Bioinformation. 2007; 1(6):197-202. PMC: 1891688. View