» Articles » PMID: 34607567

LPI-deepGBDT: a Multiple-layer Deep Framework Based on Gradient Boosting Decision Trees for LncRNA-protein Interaction Identification

Overview
Publisher Biomed Central
Specialty Biology
Date 2021 Oct 5
PMID 34607567
Citations 27
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Long noncoding RNAs (lncRNAs) play important roles in various biological and pathological processes. Discovery of lncRNA-protein interactions (LPIs) contributes to understand the biological functions and mechanisms of lncRNAs. Although wet experiments find a few interactions between lncRNAs and proteins, experimental techniques are costly and time-consuming. Therefore, computational methods are increasingly exploited to uncover the possible associations. However, existing computational methods have several limitations. First, majority of them were measured based on one simple dataset, which may result in the prediction bias. Second, few of them are applied to identify relevant data for new lncRNAs (or proteins). Finally, they failed to utilize diverse biological information of lncRNAs and proteins.

Results: Under the feed-forward deep architecture based on gradient boosting decision trees (LPI-deepGBDT), this work focuses on classify unobserved LPIs. First, three human LPI datasets and two plant LPI datasets are arranged. Second, the biological features of lncRNAs and proteins are extracted by Pyfeat and BioProt, respectively. Thirdly, the features are dimensionally reduced and concatenated as a vector to represent an lncRNA-protein pair. Finally, a deep architecture composed of forward mappings and inverse mappings is developed to predict underlying linkages between lncRNAs and proteins. LPI-deepGBDT is compared with five classical LPI prediction models (LPI-BLS, LPI-CatBoost, PLIPCOM, LPI-SKF, and LPI-HNM) under three cross validations on lncRNAs, proteins, lncRNA-protein pairs, respectively. It obtains the best average AUC and AUPR values under the majority of situations, significantly outperforming other five LPI identification methods. That is, AUCs computed by LPI-deepGBDT are 0.8321, 0.6815, and 0.9073, respectively and AUPRs are 0.8095, 0.6771, and 0.8849, respectively. The results demonstrate the powerful classification ability of LPI-deepGBDT. Case study analyses show that there may be interactions between GAS5 and Q15717, RAB30-AS1 and O00425, and LINC-01572 and P35637.

Conclusions: Integrating ensemble learning and hierarchical distributed representations and building a multiple-layered deep architecture, this work improves LPI prediction performance as well as effectively probes interaction data for new lncRNAs/proteins.

Citing Articles

BioPrediction-RPI: Democratizing the prediction of interaction between non-coding RNA and protein with end-to-end machine learning.

Florentino B, Parmezan Bonidia R, Sanches N, da Rocha U, de Carvalho A Comput Struct Biotechnol J. 2024; 23:2267-2276.

PMID: 38827228 PMC: 11140557. DOI: 10.1016/j.csbj.2024.05.031.


Fusion of multi-source relationships and topology to infer lncRNA-protein interactions.

Zhang X, Liu M, Li Z, Zhuo L, Fu X, Zou Q Mol Ther Nucleic Acids. 2024; 35(2):102187.

PMID: 38706631 PMC: 11066462. DOI: 10.1016/j.omtn.2024.102187.


Predicting lncRNA-protein interactions through deep learning framework employing multiple features and random forest algorithm.

Liang Y, Yin X, Zhang Y, Guo Y, Wang Y BMC Bioinformatics. 2024; 25(1):108.

PMID: 38475723 PMC: 10929084. DOI: 10.1186/s12859-024-05727-4.


LncRNA-Top: Controlled deep learning approaches for lncRNA gene regulatory relationship annotations across different platforms.

Xie W, Chen X, Zheng Z, Wang F, Zhu X, Lin Q iScience. 2023; 26(11):108197.

PMID: 37965148 PMC: 10641498. DOI: 10.1016/j.isci.2023.108197.


Predicting potential lncRNA biomarkers for lung cancer and neuroblastoma based on an ensemble of a deep neural network and LightGBM.

Su Z, Lu H, Wu Y, Li Z, Duan L Front Genet. 2023; 14:1238095.

PMID: 37655066 PMC: 10466784. DOI: 10.3389/fgene.2023.1238095.


References
1.
Zhang W, Yue X, Tang G, Wu W, Huang F, Zhang X . SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions. PLoS Comput Biol. 2018; 14(12):e1006616. PMC: 6331124. DOI: 10.1371/journal.pcbi.1006616. View

2.
van Poppel H, Haese A, Graefen M, De La Taille A, Irani J, de Reijke T . The relationship between Prostate CAncer gene 3 (PCA3) and prostate cancer significance. BJU Int. 2011; 109(3):360-6. DOI: 10.1111/j.1464-410X.2011.10377.x. View

3.
Cao S, Liu W, Li F, Zhao W, Qin C . Decreased expression of lncRNA GAS5 predicts a poor prognosis in cervical cancer. Int J Clin Exp Pathol. 2014; 7(10):6776-83. PMC: 4230116. View

4.
Wang W, Dai Q, Li F, Xiong Y, Wei D . MLCDForest: multi-label classification with deep forest in disease prediction for long non-coding RNAs. Brief Bioinform. 2020; 22(3). DOI: 10.1093/bib/bbaa104. View

5.
Tan C, Cao J, Chen L, Xi X, Wang S, Zhu Y . Noncoding RNAs Serve as Diagnosis and Prognosis Biomarkers for Hepatocellular Carcinoma. Clin Chem. 2019; 65(7):905-915. DOI: 10.1373/clinchem.2018.301150. View