» Articles » PMID: 31639051

A Semi-supervised Machine Learning Framework for MicroRNA Classification

Overview
Journal Hum Genomics
Publisher Biomed Central
Specialty Genetics
Date 2019 Oct 23
PMID 31639051
Citations 3
Authors
Affiliations
Soon will be listed here.
Abstract

Background: MicroRNAs (miRNAs) are a family of short, non-coding RNAs that have been linked to critical cellular activities, most notably regulation of gene expression. The identification of miRNA is a cross-disciplinary approach that requires both computational identification methods and wet-lab validation experiments, making it a resource-intensive procedure. While numerous machine learning methods have been developed to increase classification accuracy and thus reduce validation costs, most methods use supervised learning and thus require large labeled training data sets, often not feasible for less-sequenced species. On the other hand, there is now an abundance of unlabeled RNA sequence data due to the emergence of high-throughput wet-lab experimental procedures, such as next-generation sequencing.

Results: This paper explores the application of semi-supervised machine learning for miRNA classification in order to maximize the utility of both labeled and unlabeled data. We here present the novel combination of two semi-supervised approaches: active learning and multi-view co-training. Results across six diverse species show that this multi-stage semi-supervised approach is able to improve classification performance using very small numbers of labeled instances, effectively leveraging the available unlabeled data.

Conclusions: The proposed semi-supervised miRNA classification pipeline holds the potential to identify novel miRNA with high recall and precision while requiring very small numbers of previously known miRNA. Such a method could be highly beneficial when studying miRNA in newly sequenced genomes of niche species with few known examples of miRNA.

Citing Articles

Species-specific microRNA discovery and target prediction in the soybean cyst nematode.

Ajila V, Colley L, Ste-Croix D, Nissan N, Cober E, Mimee B Sci Rep. 2023; 13(1):17657.

PMID: 37848601 PMC: 10582106. DOI: 10.1038/s41598-023-44469-w.


Comprehensive study of semi-supervised learning for DNA methylation-based supervised classification of central nervous system tumors.

Tran Q, Alom M, Orr B BMC Bioinformatics. 2022; 23(1):223.

PMID: 35676649 PMC: 9178802. DOI: 10.1186/s12859-022-04764-1.


Semisupervised Deep Learning Techniques for Predicting Acute Respiratory Distress Syndrome From Time-Series Clinical Data: Model Development and Validation Study.

Lam C, Tso C, Green-Saxena A, Pellegrini E, Iqbal Z, Evans D JMIR Form Res. 2021; 5(9):e28028.

PMID: 34398784 PMC: 8447921. DOI: 10.2196/28028.

References
1.
Lertampaiporn S, Thammarongtham C, Nukoolkit C, Kaewkamnerdpong B, Ruengjitchatchawalya M . Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification. Nucleic Acids Res. 2012; 41(1):e21. PMC: 3592496. DOI: 10.1093/nar/gks878. View

2.
Peace R, Biggar K, Storey K, Green J . A framework for improving microRNA prediction in non-human genomes. Nucleic Acids Res. 2015; 43(20):e138. PMC: 4787757. DOI: 10.1093/nar/gkv698. View

3.
Liu Y . Active learning with support vector machine applied to gene expression data for cancer classification. J Chem Inf Comput Sci. 2004; 44(6):1936-41. DOI: 10.1021/ci049810a. View

4.
Peace R, Sheikh Hassani M, Green J . miPIE: NGS-based Prediction of miRNA Using Integrated Evidence. Sci Rep. 2019; 9(1):1548. PMC: 6367335. DOI: 10.1038/s41598-018-38107-z. View

5.
Luo Q, Zhang Z, Dai Z, Basnet S, Li S, Xu B . Tumor-suppressive microRNA-195-5p regulates cell growth and inhibits cell cycle by targeting cyclin dependent kinase 8 in colon cancer. Am J Transl Res. 2016; 8(5):2088-96. PMC: 4891422. View