BioSeq-Analysis2.0: an Updated Platform for Analyzing DNA, RNA and Protein Sequences at Sequence Level and Residue Level Based on Machine Learning Approaches

Overview

Journal Nucleic Acids Res

Publisher Oxford University Press

Specialty Biochemistry

Date 2019 Sep 11

PMID 31504851

Citations 120

Authors

Bin Liu

Xin Gao

Hanyu Zhang

Affiliations

Soon will be listed here.

Abstract

As the first web server to analyze various biological sequences at sequence level based on machine learning approaches, many powerful predictors in the field of computational biology have been developed with the assistance of the BioSeq-Analysis. However, the BioSeq-Analysis can be only applied to the sequence-level analysis tasks, preventing its applications to the residue-level analysis tasks, and an intelligent tool that is able to automatically generate various predictors for biological sequence analysis at both residue level and sequence level is highly desired. In this regard, we decided to publish an important updated server covering a total of 26 features at the residue level and 90 features at the sequence level called BioSeq-Analysis2.0 (http://bliulab.net/BioSeq-Analysis2.0/), by which the users only need to upload the benchmark dataset, and the BioSeq-Analysis2.0 can generate the predictors for both residue-level analysis and sequence-level analysis tasks. Furthermore, the corresponding stand-alone tool was also provided, which can be downloaded from http://bliulab.net/BioSeq-Analysis2.0/download/. To the best of our knowledge, the BioSeq-Analysis2.0 is the first tool for generating predictors for biological sequence analysis tasks at residue level. Specifically, the experimental results indicated that the predictors developed by BioSeq-Analysis2.0 can achieve comparable or even better performance than the existing state-of-the-art predictors.

Citing Articles

Conotoxins: Classification, Prediction, and Future Directions in Bioinformatics.

Li R, Yu J, Ye D, Liu S, Zhang H, Lin H Toxins (Basel). 2025; 17(2).

PMID: 39998095 PMC: 11860864. DOI: 10.3390/toxins17020078.

Overview and Prospects of DNA Sequence Visualization.

Wu Y, Xie X, Zhu J, Guan L, Li M Int J Mol Sci. 2025; 26(2).

PMID: 39859192 PMC: 11764684. DOI: 10.3390/ijms26020477.

Identify potential drug candidates within a high-quality compound search space.

Ru X, Zhao S, Zou Q, Xu L Brief Bioinform. 2025; 26(1).

PMID: 39853109 PMC: 11758506. DOI: 10.1093/bib/bbaf024.

Empirical Comparison and Analysis of Artificial Intelligence-Based Methods for Identifying Phosphorylation Sites of SARS-CoV-2 Infection.

Lai H, Zhu T, Xie S, Luo X, Hong F, Luo D Int J Mol Sci. 2025; 25(24.

PMID: 39769436 PMC: 11678915. DOI: 10.3390/ijms252413674.

Annotating protein functions via fusing multiple biological modalities.

Ma W, Bi X, Jiang H, Wei Z, Zhang S Commun Biol. 2024; 7(1):1705.

PMID: 39730886 PMC: 11681170. DOI: 10.1038/s42003-024-07411-y.

References

Henikoff S, Henikoff J . Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992; 89(22):10915-9. PMC: 50453. DOI: 10.1073/pnas.89.22.10915. View

Sun S, Thomas P, Dill K . A simple protein folding algorithm using a binary code and secondary structure constraints. Protein Eng. 1995; 8(8):769-78. DOI: 10.1093/protein/8.8.769. View

Chen Y, Chen Z, Gong Y, Ying G . SUMOhydro: a novel method for the prediction of sumoylation sites based on hydrophobic properties. PLoS One. 2012; 7(6):e39195. PMC: 3375222. DOI: 10.1371/journal.pone.0039195. View

Sandberg M, Eriksson L, Jonsson J, Sjostrom M, Wold S . New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. J Med Chem. 1998; 41(14):2481-91. DOI: 10.1021/jm9700575. View

Chou K, Cai Y . Prediction and classification of protein subcellular location-sequence-order effect and pseudo amino acid composition. J Cell Biochem. 2003; 90(6):1250-60. DOI: 10.1002/jcb.10719. View

Chou K . Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol. 2010; 273(1):236-47. PMC: 7125570. DOI: 10.1016/j.jtbi.2010.12.024. View

Lin H, Deng E, Ding H, Chen W, Chou K . iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res. 2014; 42(21):12961-72. PMC: 4245931. DOI: 10.1093/nar/gku1019. View

Zou Q, Xing P, Wei L, Liu B . Gene2vec: gene subsequence embedding for prediction of mammalian -methyladenosine sites from mRNA. RNA. 2018; 25(2):205-218. PMC: 6348985. DOI: 10.1261/rna.069112.118. View

Li M, Lin L, Wang X, Liu T . Protein-protein interaction site prediction based on conditional random fields. Bioinformatics. 2007; 23(5):597-604. DOI: 10.1093/bioinformatics/btl660. View

10.

Liu B, Xu J, Lan X, Xu R, Zhou J, Wang X . iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One. 2014; 9(9):e106691. PMC: 4153653. DOI: 10.1371/journal.pone.0106691. View

11.

Yan K, Fang X, Xu Y, Liu B . Protein fold recognition based on multi-view modeling. Bioinformatics. 2019; 35(17):2982-2990. DOI: 10.1093/bioinformatics/btz040. View

12.

Chen K, Kurgan L, Ruan J . Prediction of protein structural class using novel evolutionary collocation-based sequence representation. J Comput Chem. 2008; 29(10):1596-604. DOI: 10.1002/jcc.20918. View

13.

Doench J, Fusi N, Sullender M, Hegde M, Vaimberg E, Donovan K . Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat Biotechnol. 2016; 34(2):184-191. PMC: 4744125. DOI: 10.1038/nbt.3437. View

14.

Ishida T, Kinoshita K . PrDOS: prediction of disordered protein regions from amino acid sequence. Nucleic Acids Res. 2007; 35(Web Server issue):W460-4. PMC: 1933209. DOI: 10.1093/nar/gkm363. View

15.

Zhang J, Liu B . PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation. Int J Mol Sci. 2017; 18(9). PMC: 5618505. DOI: 10.3390/ijms18091856. View

16.

Friedel M, Nikolajewa S, Suhnel J, Wilhelm T . DiProDB: a database for dinucleotide properties. Nucleic Acids Res. 2008; 37(Database issue):D37-40. PMC: 2686603. DOI: 10.1093/nar/gkn597. View

17.

Chou K . Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics. 2004; 21(1):10-9. DOI: 10.1093/bioinformatics/bth466. View

18.

Horne D . Prediction of protein helix content from an autocorrelation analysis of sequence hydrophobicities. Biopolymers. 1988; 27(3):451-77. DOI: 10.1002/bip.360270308. View

19.

Chen W, Tran H, Liang Z, Lin H, Zhang L . Identification and analysis of the N(6)-methyladenosine in the Saccharomyces cerevisiae transcriptome. Sci Rep. 2015; 5:13859. PMC: 4561376. DOI: 10.1038/srep13859. View

20.

Chen K, Jiang Y, Du L, Kurgan L . Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs. J Comput Chem. 2008; 30(1):163-72. DOI: 10.1002/jcc.21053. View