» Articles » PMID: 38187750

Unexplored Regions of the Protein Sequence-structure Map Revealed at Scale by a Library of Foldtuned Language Models

Overview
Journal bioRxiv
Date 2024 Jan 8
PMID 38187750
Authors
Affiliations
Soon will be listed here.
Abstract

The combinatorial scale of amino-acid sequence-space has traditionally precluded substantive study of the full protein sequence-structure map. It remains unknown, for instance, how much of the vast uncharted landscape of far-from-natural sequences encodes the familiar ensemble of natural folds in a fashion consistent with the laws of biophysics but seemingly untouched by evolution on Earth. The scale of sequence perturbations required to access these spaces exceeds the reach of even gold-standard experimental approaches such as directed evolution. We surpass this limitation guided by the innate capacity of protein language models (pLMs) to explore sequences outside their natural training data through generation and self-feedback. We recast pLMs as probes that explore into regions of protein "deep space" that possess little-to-no detectable homology to natural examples, while enforcing core structural constraints, in a novel sequence design approach that we term "foldtuning." We build a library of foldtuned pLMs for >700 natural folds in the SCOP database, covering numerous high-priority targets for synthetic biology, including GPCRs and small GTPases, composable cell-surface-receptor and DNA-binding domains, and small signaling/regulatory domains. Candidate proteins generated by foldtuned pLMs reflect distinctive new "rules of language" for sequence innovation beyond detectable homology to any known protein and sample subtle structural alterations in a manner reminiscent of natural structural evolution and diversification. Experimental validation of two markedly different fold targets; the tyrosine-kinase- and small-GTPase-regulating SH3 domain and the bacterial RNase inhibitor barstar demonstrates that foldtuning proposes protein variants that express and fold stably and function . Foldtuning reveals protein sequence-structure information at scale outside of the context of evolution and promises to push forward the redesign and reconstitution of novel-to-nature synthetic biological systems for applications in health and catalysis.

References
1.
Kamtekar S, Schiffer J, Xiong H, Babik J, Hecht M . Protein design by binary patterning of polar and nonpolar amino acids. Science. 1993; 262(5140):1680-5. DOI: 10.1126/science.8259512. View

2.
Vyas P, Trofimyuk O, Longo L, Deshmukh F, Sharon M, Tawfik D . Helicase-like functions in phosphate loop containing beta-alpha polypeptides. Proc Natl Acad Sci U S A. 2021; 118(16). PMC: 8072362. DOI: 10.1073/pnas.2016131118. View

2.
. The next giant step for microbes. Nat Biotechnol. 2023; 41(1):1. DOI: 10.1038/s41587-022-01655-x. View

3.
Watters A, Deka P, Corrent C, Callender D, Varani G, Sosnick T . The highly cooperative folding of small naturally occurring proteins is likely the result of natural selection. Cell. 2007; 128(3):613-24. DOI: 10.1016/j.cell.2006.12.042. View

4.
Mirdita M, Schutze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M . ColabFold: making protein folding accessible to all. Nat Methods. 2022; 19(6):679-682. PMC: 9184281. DOI: 10.1038/s41592-022-01488-1. View