» Articles » PMID: 36351980

LncDC: a Machine Learning-based Tool for Long Non-coding RNA Detection from RNA-Seq Data

Overview
Journal Sci Rep
Specialty Science
Date 2022 Nov 9
PMID 36351980
Authors
Affiliations
Soon will be listed here.
Abstract

Long non-coding RNAs (lncRNAs) play an essential role in diverse biological processes and disease development. Accurate classification of lncRNAs and mRNAs is important for the identification of tissue- or disease-specific lncRNAs. Here, we present our tool LncDC (Long non-coding RNA detection) that is able to accurately predict lncRNAs with an XGBoost model using features extracted from RNA sequences, secondary structures, and translated proteins. Benchmarking experiments showed that LncDC consistently outperformed six state-of-the-art tools in distinguishing lncRNAs from mRNAs. Notably, the use of sequence and secondary structure (SASS) k-mer score features and flexible ORF features improved the classification capability of LncDC. We anticipate that LncDC will definitely promote the discovery of more and novel disease-specific lncRNAs. LncDC is implemented in Python and freely available at https://github.com/lim74/LncDC .

Citing Articles

Computational Resources for lncRNA Functions and Targetome.

Thakur A, Kumar M Methods Mol Biol. 2024; 2883:299-323.

PMID: 39702714 DOI: 10.1007/978-1-0716-4290-0_13.


Simulated Annealing for RNA Design with SIMARD.

Tsang H Methods Mol Biol. 2024; 2847:95-108.

PMID: 39312138 DOI: 10.1007/978-1-0716-4079-1_6.


Discovering the hidden function in fungal genomes.

Gervais N, Shapiro R Nat Commun. 2024; 15(1):8219.

PMID: 39300175 PMC: 11413187. DOI: 10.1038/s41467-024-52568-z.


Comparison and benchmark of deep learning methods for non-coding RNA classification.

Creux C, Zehraoui F, Radvanyi F, Tahi F PLoS Comput Biol. 2024; 20(9):e1012446.

PMID: 39264986 PMC: 11421803. DOI: 10.1371/journal.pcbi.1012446.


Decoding the Non-coding: Tools and Databases Unveiling the Hidden World of "Junk" RNAs for Innovative Therapeutic Exploration.

Chaudhary U, Banerjee S ACS Pharmacol Transl Sci. 2024; 7(7):1901-1915.

PMID: 39022352 PMC: 11249652. DOI: 10.1021/acsptsci.3c00388.


References
1.
Cabili M, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A . Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011; 25(18):1915-27. PMC: 3185964. DOI: 10.1101/gad.17446611. View

2.
Zhao J, Song X, Wang K . lncScore: alignment-free identification of long noncoding RNA from assembled novel transcripts. Sci Rep. 2016; 6:34838. PMC: 5052565. DOI: 10.1038/srep34838. View

3.
Budak H, Kaya S, Cagirici H . Long Non-coding RNA in Plants in the Era of Reference Sequences. Front Plant Sci. 2020; 11:276. PMC: 7080850. DOI: 10.3389/fpls.2020.00276. View

4.
Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H . The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 2012; 22(9):1775-89. PMC: 3431493. DOI: 10.1101/gr.132159.111. View

5.
Zuker M, Stiegler P . Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 1981; 9(1):133-48. PMC: 326673. DOI: 10.1093/nar/9.1.133. View