» Articles » PMID: 36927031

Evolutionary-scale Prediction of Atomic-level Protein Structure with a Language Model

Overview
Journal Science
Specialty Science
Date 2023 Mar 17
PMID 36927031
Authors
Affiliations
Soon will be listed here.
Abstract

Recent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level protein structure from primary sequence using a large language model. As language models of protein sequences are scaled up to 15 billion parameters, an atomic-resolution picture of protein structure emerges in the learned representations. This results in an order-of-magnitude acceleration of high-resolution structure prediction, which enables large-scale structural characterization of metagenomic proteins. We apply this capability to construct the ESM Metagenomic Atlas by predicting structures for >617 million metagenomic protein sequences, including >225 million that are predicted with high confidence, which gives a view into the vast breadth and diversity of natural proteins.

Citing Articles

A versatile toolbox for determining IRES activity in cells and embryonic tissues.

Koch P, Zhang Z, Genuth N, Susanto T, Haimann M, Khmelinskaia A EMBO J. 2025; .

PMID: 40082722 DOI: 10.1038/s44318-025-00404-5.


Seq2Topt: a sequence-based deep learning predictor of enzyme optimal temperature.

Qiu S, Hu B, Zhao J, Xu W, Yang A Brief Bioinform. 2025; 26(2).

PMID: 40079266 PMC: 11904407. DOI: 10.1093/bib/bbaf114.


Development of a multi-epitope vaccine candidate to combat SARS-CoV-2 and dengue virus co-infection through an immunoinformatic approach.

Mandal S, Chanu W, Natarajaseenivasan K Front Immunol. 2025; 16:1442101.

PMID: 40079004 PMC: 11897530. DOI: 10.3389/fimmu.2025.1442101.


Foundation models in bioinformatics.

Guo F, Guan R, Li Y, Liu Q, Wang X, Yang C Natl Sci Rev. 2025; 12(4):nwaf028.

PMID: 40078374 PMC: 11900445. DOI: 10.1093/nsr/nwaf028.


De Novo Design of Large Polypeptides Using a Lightweight Diffusion Model Integrating LSTM and Attention Mechanism Under Per-Residue Secondary Structure Constraints.

Liao S, Xu G, Jin L, Ma J Molecules. 2025; 30(5).

PMID: 40076339 PMC: 11902264. DOI: 10.3390/molecules30051116.