RepRNA: a Web Server for Generating Various Feature Vectors of RNA Sequences

Overview

Journal Mol Genet Genomics

Specialty Genetics

Date 2015 Jun 19

PMID 26085220

Citations 61

Authors

Bin Liu

Fule Liu

Longyun Fang

Xiaolong Wang

Kuo-Chen Chou

Affiliations

Soon will be listed here.

Abstract

With the rapid growth of RNA sequences generated in the postgenomic age, it is highly desired to develop a flexible method that can generate various kinds of vectors to represent these sequences by focusing on their different features. This is because nearly all the existing machine-learning methods, such as SVM (support vector machine) and KNN (k-nearest neighbor), can only handle vectors but not sequences. To meet the increasing demands and speed up the genome analyses, we have developed a new web server, called "representations of RNA sequences" (repRNA). Compared with the existing methods, repRNA is much more comprehensive, flexible and powerful, as reflected by the following facts: (1) it can generate 11 different modes of feature vectors for users to choose according to their investigation purposes; (2) it allows users to select the features from 22 built-in physicochemical properties and even those defined by users' own; (3) the resultant feature vectors and the secondary structures of the corresponding RNA sequences can be visualized. The repRNA web server is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repRNA/ .

Citing Articles

Reliable method for predicting the binding affinity of RNA-small molecule interactions using machine learning.

Krishnan S, Roy A, Gromiha M Brief Bioinform. 2024; 25(2).

PMID: 38261341 PMC: 10805179. DOI: 10.1093/bib/bbae002.

Hemolytic-Pred: A machine learning-based predictor for hemolytic proteins using position and composition-based features.

Perveen G, Alturise F, Alkhalifah T, Khan Y Digit Health. 2023; 9:20552076231180739.

PMID: 37434723 PMC: 10331097. DOI: 10.1177/20552076231180739.

iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets.

Chen Z, Liu X, Zhao P, Li C, Wang Y, Li F Nucleic Acids Res. 2022; 50(W1):W434-W447.

PMID: 35524557 PMC: 9252729. DOI: 10.1093/nar/gkac351.

XGEM: Predicting Essential miRNAs by the Ensembles of Various Sequence-Based Classifiers With XGBoost Algorithm.

Min H, Xin X, Gao C, Wang L, Du P Front Genet. 2022; 13:877409.

PMID: 35419029 PMC: 8996062. DOI: 10.3389/fgene.2022.877409.

PSSMCOOL: a comprehensive R package for generating evolutionary-based descriptors of protein sequences from PSSM profiles.

Mohammadi A, Zahiri J, Mohammadi S, Khodarahmi M, Arab S Biol Methods Protoc. 2022; 7(1):bpac008.

PMID: 35388370 PMC: 8977839. DOI: 10.1093/biomethods/bpac008.

References

Kumar R, Srivastava A, Kumari B, Kumar M . Prediction of β-lactamase and its class by Chou's pseudo-amino acid composition and support vector machine. J Theor Biol. 2014; 365:96-103. DOI: 10.1016/j.jtbi.2014.10.008. View

Mei S . Multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization. J Theor Biol. 2011; 293:121-30. DOI: 10.1016/j.jtbi.2011.10.015. View

Chou K . Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol. 2010; 273(1):236-47. PMC: 7125570. DOI: 10.1016/j.jtbi.2010.12.024. View

Liu B, Fang L, Liu F, Wang X, Chen J, Chou K . Identification of real microRNA precursors with a pseudo structure status composition approach. PLoS One. 2015; 10(3):e0121501. PMC: 4378912. DOI: 10.1371/journal.pone.0121501. View

Esmaeili M, Mohabatkar H, Mohsenzadeh S . Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol. 2009; 263(2):203-9. DOI: 10.1016/j.jtbi.2009.11.016. View

Liu Z, Xiao X, Qiu W, Chou K . iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. Anal Biochem. 2015; 474:69-77. DOI: 10.1016/j.ab.2014.12.009. View

Chen W, Feng P, Lin H, Chou K . iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition. Biomed Res Int. 2014; 2014:623149. PMC: 4055483. DOI: 10.1155/2014/623149. View

Liu B, Fang L, Liu F, Wang X, Chou K . iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach. J Biomol Struct Dyn. 2015; 34(1):223-35. DOI: 10.1080/07391102.2015.1014422. View

Liu B, Xu J, Fan S, Xu R, Zhou J, Wang X . PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation. Mol Inform. 2016; 34(1):8-17. DOI: 10.1002/minf.201400025. View

10.

Chen W, Lei T, Jin D, Lin H, Chou K . PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. Anal Biochem. 2014; 456:53-60. DOI: 10.1016/j.ab.2014.04.001. View

11.

Lin H, Deng E, Ding H, Chen W, Chou K . iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res. 2014; 42(21):12961-72. PMC: 4245931. DOI: 10.1093/nar/gku1019. View

12.

Du P, Gu S, Jiao Y . PseAAC-General: fast building various modes of general form of Chou's pseudo-amino acid composition for large-scale protein datasets. Int J Mol Sci. 2014; 15(3):3495-506. PMC: 3975349. DOI: 10.3390/ijms15033495. View

13.

Lorenz R, Bernhart S, Honer Zu Siederdissen C, Tafer H, Flamm C, Stadler P . ViennaRNA Package 2.0. Algorithms Mol Biol. 2011; 6:26. PMC: 3319429. DOI: 10.1186/1748-7188-6-26. View

14.

Cao D, Xu Q, Liang Y . propy: a tool to generate various modes of Chou's PseAAC. Bioinformatics. 2013; 29(7):960-2. DOI: 10.1093/bioinformatics/btt072. View

15.

Liu B, Wang X, Zou Q, Dong Q, Chen Q . Protein Remote Homology Detection by Combining Chou's Pseudo Amino Acid Composition and Profile-Based Protein Representation. Mol Inform. 2016; 32(9-10):775-82. DOI: 10.1002/minf.201300084. View

16.

Nanni L, Lumini A . Genetic programming for creating Chou's pseudo amino acid based features for submitochondria localization. Amino Acids. 2008; 34(4):653-60. DOI: 10.1007/s00726-007-0018-1. View

17.

Ding H, Deng E, Yuan L, Liu L, Lin H, Chen W . iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. Biomed Res Int. 2014; 2014:286419. PMC: 4058692. DOI: 10.1155/2014/286419. View

18.

Liu B, Liu F, Fang L, Wang X, Chou K . repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. Bioinformatics. 2014; 31(8):1307-9. DOI: 10.1093/bioinformatics/btu820. View

19.

Zhou X, Chen C, Li Z, Zou X . Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol. 2007; 248(3):546-51. DOI: 10.1016/j.jtbi.2007.06.001. View

20.

Wei L, Liao M, Gao Y, Ji R, He Z, Zou Q . Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set. IEEE/ACM Trans Comput Biol Bioinform. 2015; 11(1):192-201. DOI: 10.1109/TCBB.2013.146. View