» Articles » PMID: 34581805

BioSeq-BLM: a Platform for Analyzing DNA, RNA and Protein Sequences Based on Biological Language Models

Overview
Specialty Biochemistry
Date 2021 Sep 28
PMID 34581805
Citations 65
Authors
Affiliations
Soon will be listed here.
Abstract

In order to uncover the meanings of 'book of life', 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of 'book of life'. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/.

Citing Articles

Conotoxins: Classification, Prediction, and Future Directions in Bioinformatics.

Li R, Yu J, Ye D, Liu S, Zhang H, Lin H Toxins (Basel). 2025; 17(2).

PMID: 39998095 PMC: 11860864. DOI: 10.3390/toxins17020078.


FORAlign: accelerating gap-affine DNA pairwise sequence alignment using FOR-blocks based on Four Russians approach with linear space complexity.

Wei Y, Zhou T, Zhai Y, Yu L, Zou Q Brief Bioinform. 2025; 26(1).

PMID: 39987460 PMC: 11846685. DOI: 10.1093/bib/bbaf061.


SpaCcLink: exploring downstream signaling regulations with graph attention network for systematic inference of spatial cell-cell communication.

Liu J, Ma L, Ju F, Zhao C, Yu L BMC Biol. 2025; 23(1):44.

PMID: 39939849 PMC: 11823213. DOI: 10.1186/s12915-025-02141-x.


Prediction of hemolytic peptides and their hemolytic concentration.

Rathore A, Kumar N, Choudhury S, Mehta N, Raghava G Commun Biol. 2025; 8(1):176.

PMID: 39905233 PMC: 11794569. DOI: 10.1038/s42003-025-07615-w.


RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models.

Asim M, Ibrahim M, Asif T, Dengel A Heliyon. 2025; 11(2):e41488.

PMID: 39897847 PMC: 11783440. DOI: 10.1016/j.heliyon.2024.e41488.


References
1.
Leslie C, Eskin E, Cohen A, Weston J, Noble W . Mismatch string kernels for discriminative protein classification. Bioinformatics. 2004; 20(4):467-76. DOI: 10.1093/bioinformatics/btg431. View

2.
Alipanahi B, Delong A, Weirauch M, Frey B . Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol. 2015; 33(8):831-8. DOI: 10.1038/nbt.3300. View

3.
Hochreiter S, Schmidhuber J . Long short-term memory. Neural Comput. 1997; 9(8):1735-80. DOI: 10.1162/neco.1997.9.8.1735. View

4.
Darst B, Malecki K, Engelman C . Using recursive feature elimination in random forest to account for correlated variables in high dimensional data. BMC Genet. 2018; 19(Suppl 1):65. PMC: 6157185. DOI: 10.1186/s12863-018-0633-8. View

5.
Hanson J, Yang Y, Paliwal K, Zhou Y . Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks. Bioinformatics. 2016; 33(5):685-692. DOI: 10.1093/bioinformatics/btw678. View