» Articles » PMID: 29244011

A Deep Learning Method for LincRNA Detection Using Auto-encoder Algorithm

Overview
Publisher Biomed Central
Specialty Biology
Date 2017 Dec 16
PMID 29244011
Citations 11
Authors
Affiliations
Soon will be listed here.
Abstract

Background: RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition.

Results: The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction.

Conclusions: The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences.

Citing Articles

PredPSP: a novel computational tool to discover pathway-specific photosynthetic proteins in plants.

Meher P, Pradhan U, Sethi P, Naha S, Gupta A, Parsad R Plant Mol Biol. 2024; 114(5):106.

PMID: 39316155 DOI: 10.1007/s11103-024-01500-6.


Current understanding of functional peptides encoded by lncRNA in cancer.

Tian H, Tang L, Yang Z, Xiang Y, Min Q, Yin M Cancer Cell Int. 2024; 24(1):252.

PMID: 39030557 PMC: 11265036. DOI: 10.1186/s12935-024-03446-7.


Deep Learning-Based Protein Features Predict Overall Survival and Chemotherapy Benefit in Gastric Cancer.

Zhao X, Xia X, Wang X, Bai M, Zhan D, Shu K Front Oncol. 2022; 12:847706.

PMID: 35651795 PMC: 9148960. DOI: 10.3389/fonc.2022.847706.


A Systematic Review of Artificial Intelligence Techniques in Cancer Prediction and Diagnosis.

Kumar Y, Gupta S, Singla R, Hu Y Arch Comput Methods Eng. 2021; 29(4):2043-2070.

PMID: 34602811 PMC: 8475374. DOI: 10.1007/s11831-021-09648-w.


Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs.

Asim M, Ibrahim M, Malik M, Dengel A, Ahmed S Int J Mol Sci. 2021; 22(16).

PMID: 34445436 PMC: 8395733. DOI: 10.3390/ijms22168719.


References
1.
Luo H, Bu D, Sun L, Fang S, Liu Z, Zhao Y . Identification and function annotation of long intervening noncoding RNAs. Brief Bioinform. 2016; 18(5):789-797. DOI: 10.1093/bib/bbw046. View

2.
Hinton G . Learning multiple layers of representation. Trends Cogn Sci. 2007; 11(10):428-34. DOI: 10.1016/j.tics.2007.09.004. View

3.
Nair A, Sreenadhan S . A coding measure scheme employing electron-ion interaction pseudopotential (EIIP). Bioinformation. 2007; 1(6):197-202. PMC: 1891688. View

4.
Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H . The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 2012; 22(9):1775-89. PMC: 3431493. DOI: 10.1101/gr.132159.111. View

5.
Guo X, Yu N, Ding X, Wang J, Pan Y . DIME: a novel framework for de novo metagenomic sequence assembly. J Comput Biol. 2015; 22(2):159-77. PMC: 4326031. DOI: 10.1089/cmb.2014.0251. View