» Articles » PMID: 34853475

De Novo Protein Design by Deep Network Hallucination

Abstract

There has been considerable recent progress in protein structure prediction using deep neural networks to predict inter-residue distances from amino acid sequences. Here we investigate whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occurring proteins used in training the models. We generate random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting residue-residue distance maps, which, as expected, are quite featureless. We then carry out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (Kullback-Leibler divergence) between the inter-residue distance distributions predicted by the network and background distributions averaged over all proteins. Optimization from different random starting points resulted in novel proteins spanning a wide range of sequences and predicted structures. We obtained synthetic genes encoding 129 of the network-'hallucinated' sequences, and expressed and purified the proteins in Escherichia coli; 27 of the proteins yielded monodisperse species with circular dichroism spectra consistent with the hallucinated structures. We determined the three-dimensional structures of three of the hallucinated proteins, two by X-ray crystallography and one by NMR, and these closely matched the hallucinated models. Thus, deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute alongside traditional physics-based models to the de novo design of proteins with new functions.

Citing Articles

AI-assisted protein design to rapidly convert antibody sequences to intrabodies targeting diverse peptides and histone modifications.

Galindo G, Maejima D, DeRoo J, Burlingham S, Fixen G, Morisaki T bioRxiv. 2025; .

PMID: 39975170 PMC: 11839053. DOI: 10.1101/2025.02.06.636921.


De novo design of transmembrane fluorescence-activating proteins.

Zhu J, Liang M, Sun K, Wei Y, Guo R, Zhang L Nature. 2025; .

PMID: 39972138 DOI: 10.1038/s41586-025-08598-8.


Leveraging large language models for peptide antibiotic design.

Guan C, Fernandes F, Franco O, de la Fuente-Nunez C Cell Rep Phys Sci. 2025; 6(1).

PMID: 39949833 PMC: 11823563. DOI: 10.1016/j.xcrp.2024.102359.


Structural prediction of chimeric immunogen candidates to elicit targeted antibodies against betacoronaviruses.

Simpson J, Kasson P PLoS Comput Biol. 2025; 21(2):e1012812.

PMID: 39908344 PMC: 11809852. DOI: 10.1371/journal.pcbi.1012812.


When synthetic biology meets medicine.

Feng Y, Su C, Mao G, Sun B, Cai Y, Dai J Life Med. 2025; 3(1):lnae010.

PMID: 39872399 PMC: 11749639. DOI: 10.1093/lifemedi/lnae010.


References
1.
Xu J . Distance-based protein folding powered by deep learning. Proc Natl Acad Sci U S A. 2019; 116(34):16856-16865. PMC: 6708335. DOI: 10.1073/pnas.1821309116. View

2.
Senior A, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T . Improved protein structure prediction using potentials from deep learning. Nature. 2020; 577(7792):706-710. DOI: 10.1038/s41586-019-1923-7. View

3.
Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, Baker D . Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci U S A. 2020; 117(3):1496-1503. PMC: 6983395. DOI: 10.1073/pnas.1914677117. View

4.
Biswas S, Khimulya G, Alley E, Esvelt K, Church G . Low-N protein engineering with data-efficient deep learning. Nat Methods. 2021; 18(4):389-396. DOI: 10.1038/s41592-021-01100-y. View

5.
Wang J, Cao H, Zhang J, Qi Y . Computational Protein Design with Deep Learning Neural Networks. Sci Rep. 2018; 8(1):6349. PMC: 5910428. DOI: 10.1038/s41598-018-24760-x. View