» Articles » PMID: 36651724

Transformer-based Deep Learning for Predicting Protein Properties in the Life Sciences

Overview
Journal Elife
Specialty Biology
Date 2023 Jan 18
PMID 36651724
Authors
Affiliations
Soon will be listed here.
Abstract

Recent developments in deep learning, coupled with an increasing number of sequenced proteins, have led to a breakthrough in life science applications, in particular in protein property prediction. There is hope that deep learning can close the gap between the number of sequenced proteins and proteins with known properties based on lab experiments. Language models from the field of natural language processing have gained popularity for protein property predictions and have led to a new computational revolution in biology, where old prediction results are being improved regularly. Such models can learn useful multipurpose representations of proteins from large open repositories of protein sequences and can be used, for instance, to predict protein properties. The field of natural language processing is growing quickly because of developments in a class of models based on a particular model-the Transformer model. We review recent developments and the use of large-scale Transformer models in applications for predicting protein characteristics and how such models can be used to predict, for example, post-translational modifications. We review shortcomings of other deep learning models and explain how the Transformer models have quickly proven to be a very promising way to unravel information hidden in the sequences of amino acids.

Citing Articles

Discordance between a deep learning model and clinical-grade variant pathogenicity classification in a rare disease cohort.

Kong S, Lee I, Collen L, Field M, Manrai A, Snapper S NPJ Genom Med. 2025; 10(1):17.

PMID: 40021654 PMC: 11871343. DOI: 10.1038/s41525-025-00480-w.


Molecular Dynamics (MD)-Derived Features for Canonical and Noncanonical Amino Acids.

Hui T, Secor M, Ho M, Bayaraa N, Lin Y J Chem Inf Model. 2025; 65(4):1837-1849.

PMID: 39895111 PMC: 11863381. DOI: 10.1021/acs.jcim.4c02102.


Predicting gene sequences with AI to study codon usage patterns.

Sidi T, Bahiri-Elitzur S, Tuller T, Kolodny R Proc Natl Acad Sci U S A. 2024; 122(1):e2410003121.

PMID: 39739812 PMC: 11725940. DOI: 10.1073/pnas.2410003121.


Scaling down for efficiency: Medium-sized protein language models perform well at transfer learning on realistic datasets.

Vieira L, Handojo M, Handojo M, Wilke C bioRxiv. 2024; .

PMID: 39605589 PMC: 11601519. DOI: 10.1101/2024.11.22.624936.


Exploring the hidden world of RNA viruses with a transformer-based tool.

Nakagawa S, Sakaguchi S Patterns (N Y). 2024; 5(11):101095.

PMID: 39568477 PMC: 11573883. DOI: 10.1016/j.patter.2024.101095.


References
1.
Jia J, Liu Z, Xiao X, Liu B, Chou K . iSuc-PseOpt: Identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Anal Biochem. 2016; 497:48-56. DOI: 10.1016/j.ab.2015.12.009. View

2.
Bileschi M, Belanger D, Bryant D, Sanderson T, Carter B, Sculley D . Using deep learning to annotate the protein universe. Nat Biotechnol. 2022; 40(6):932-937. DOI: 10.1038/s41587-021-01179-w. View

3.
Mirdita M, Schutze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M . ColabFold: making protein folding accessible to all. Nat Methods. 2022; 19(6):679-682. PMC: 9184281. DOI: 10.1038/s41592-022-01488-1. View

4.
Tavares L, Silva C, de Souza V, da Silva V, Diniz C, Santos M . Strategies and molecular tools to fight antimicrobial resistance: resistome, transcriptome, and antimicrobial peptides. Front Microbiol. 2014; 4:412. PMC: 3876575. DOI: 10.3389/fmicb.2013.00412. View

5.
Rao R, Bhattacharya N, Thomas N, Duan Y, Chen X, Canny J . Evaluating Protein Transfer Learning with TAPE. Adv Neural Inf Process Syst. 2021; 32:9689-9701. PMC: 7774645. View