Guaranteed Discrete Energy Optimization on Large Protein Design Problems
Overview
Chemistry
Affiliations
In Computational Protein Design (CPD), assuming a rigid backbone and amino-acid rotamer library, the problem of finding a sequence with an optimal conformation is NP-hard. In this paper, using Dunbrack's rotamer library and Talaris2014 decomposable energy function, we use an exact deterministic method combining branch and bound, arc consistency, and tree-decomposition to provenly identify the global minimum energy sequence-conformation on full-redesign problems, defining search spaces of size up to 10(234). This is achieved on a single core of a standard computing server, requiring a maximum of 66GB RAM. A variant of the algorithm is able to exhaustively enumerate all sequence-conformations within an energy threshold of the optimum. These proven optimal solutions are then used to evaluate the frequencies and amplitudes, in energy and sequence, at which an existing CPD-dedicated simulated annealing implementation may miss the optimum on these full redesign problems. The probability of finding an optimum drops close to 0 very quickly. In the worst case, despite 1,000 repeats, the annealing algorithm remained more than 1 Rosetta unit away from the optimum, leading to design sequences that could differ from the optimal sequence by more than 30% of their amino acids.
Colom M, Vucinic J, Adolf-Bryfogle J, Bowman J, Verel S, Moczygemba I Protein Sci. 2024; 33(8):e5109.
PMID: 38989563 PMC: 11237556. DOI: 10.1002/pro.5109.
Knowledge-Based Unfolded State Model for Protein Design.
Opuu V, Mignon D, Simonson T Methods Mol Biol. 2022; 2405:403-424.
PMID: 35298824 DOI: 10.1007/978-1-0716-1855-4_19.
Computational Design of Miniprotein Binders.
Bouchiba Y, Ruffini M, Schiex T, Barbe S Methods Mol Biol. 2022; 2405:361-382.
PMID: 35298822 DOI: 10.1007/978-1-0716-1855-4_17.
Protein Design with Deep Learning.
Defresne M, Barbe S, Schiex T Int J Mol Sci. 2021; 22(21).
PMID: 34769173 PMC: 8584038. DOI: 10.3390/ijms222111741.
Lowegard A, Frenkel M, Holt G, Jou J, Ojewole A, Donald B PLoS Comput Biol. 2020; 16(6):e1007447.
PMID: 32511232 PMC: 7329130. DOI: 10.1371/journal.pcbi.1007447.