» Articles » PMID: 33367506

Deep Forest Ensemble Learning for Classification of Alignments of Non-coding RNA Sequences Based on Multi-view Structure Representations

Overview
Journal Brief Bioinform
Specialty Biology
Date 2020 Dec 28
PMID 33367506
Citations 2
Authors
Affiliations
Soon will be listed here.
Abstract

Non-coding RNAs (ncRNAs) play crucial roles in multiple biological processes. However, only a few ncRNAs' functions have been well studied. Given the significance of ncRNAs classification for understanding ncRNAs' functions, more and more computational methods have been introduced to improve the classification automatically and accurately. In this paper, based on a convolutional neural network and a deep forest algorithm, multi-grained cascade forest (GcForest), we propose a novel deep fusion learning framework, GcForest fusion method (GCFM), to classify alignments of ncRNA sequences for accurate clustering of ncRNAs. GCFM integrates a multi-view structure feature representation including sequence-structure alignment encoding, structure image representation and shape alignment encoding of structural subunits, enabling us to capture the potential specificity between ncRNAs. For the classification of pairwise alignment of two ncRNA sequences, the F-value of GCFM improves 6% than an existing alignment-based method. Furthermore, the clustering of ncRNA families is carried out based on the classification matrix generated from GCFM. Results suggest better performance (with 20% accuracy improved) than existing ncRNA clustering methods (RNAclust, Ensembleclust and CNNclust). Additionally, we apply GCFM to construct a phylogenetic tree of ncRNA and predict the probability of interactions between RNAs. Most ncRNAs are located correctly in the phylogenetic tree, and the prediction accuracy of RNA interaction is 90.63%. A web server (http://bmbl.sdstate.edu/gcfm/) is developed to maximize its availability, and the source code and related data are available at the same URL.

Citing Articles

AP003352.1/miR-141-3p axis enhances the proliferation of osteosarcoma by LPAR3.

Yu H, Zhang B, Qi L, Han J, Guan M, Li J PeerJ. 2023; 11:e15937.

PMID: 37727685 PMC: 10506581. DOI: 10.7717/peerj.15937.


Targeting a thrombopoietin-independent strategy in the discovery of a novel inducer of megakaryocytopoiesis, DMAG, for the treatment of thrombocytopenia.

Wang L, Liu S, Luo J, Mo Q, Ran M, Zhang T Haematologica. 2022; 108(5):1394-1411.

PMID: 36546424 PMC: 10153531. DOI: 10.3324/haematol.2022.282209.

References
1.
Mathews D, Turner D . Prediction of RNA secondary structure by free energy minimization. Curr Opin Struct Biol. 2006; 16(3):270-8. DOI: 10.1016/j.sbi.2006.05.010. View

2.
Rivas E, Eddy S . Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics. 2002; 2:8. PMC: 64605. DOI: 10.1186/1471-2105-2-8. View

3.
Yoon J, Abdelmohsen K, Gorospe M . Posttranscriptional gene regulation by long noncoding RNA. J Mol Biol. 2012; 425(19):3723-30. PMC: 3594629. DOI: 10.1016/j.jmb.2012.11.024. View

4.
Harmanci A, Sharma G, Mathews D . Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign. BMC Bioinformatics. 2007; 8:130. PMC: 1868766. DOI: 10.1186/1471-2105-8-130. View

5.
Sato K, Kato Y, Akutsu T, Asai K, Sakakibara Y . DAFS: simultaneous aligning and folding of RNA sequences via dual decomposition. Bioinformatics. 2012; 28(24):3218-24. DOI: 10.1093/bioinformatics/bts612. View