» Articles » PMID: 34751386

Modeling Sequence-Space Exploration and Emergence of Epistatic Signals in Protein Evolution

Overview
Journal Mol Biol Evol
Specialty Biology
Date 2021 Nov 9
PMID 34751386
Citations 21
Authors
Affiliations
Soon will be listed here.
Abstract

During their evolution, proteins explore sequence space via an interplay between random mutations and phenotypic selection. Here, we build upon recent progress in reconstructing data-driven fitness landscapes for families of homologous proteins, to propose stochastic models of experimental protein evolution. These models predict quantitatively important features of experimentally evolved sequence libraries, like fitness distributions and position-specific mutational spectra. They also allow us to efficiently simulate sequence libraries for a vast array of combinations of experimental parameters like sequence divergence, selection strength, and library size. We showcase the potential of the approach in reanalyzing two recent experiments to determine protein structure from signals of epistasis emerging in experimental sequence libraries. To be detectable, these signals require sufficiently large and sufficiently diverged libraries. Our modeling framework offers a quantitative explanation for different outcomes of recently published experiments. Furthermore, we can forecast the outcome of time- and resource-intensive evolution experiments, opening thereby a way to computationally optimize experimental protocols.

Citing Articles

Entrenchment and contingency in neutral protein evolution with epistasis.

Schmelkin L, Carnevale V, Haldane A, Townsend J, Chung S, Levy R bioRxiv. 2025; .

PMID: 39868204 PMC: 11761135. DOI: 10.1101/2025.01.09.632266.


Combination of Coevolutionary Information and Supervised Learning Enables Generation of Cyclic Peptide Inhibitors with Enhanced Potency from a Small Data Set.

Mazzocato Y, Frasson N, Sample M, Fregonese C, Pavan A, Caregnato A ACS Cent Sci. 2024; 10(12):2242-2252.

PMID: 39735311 PMC: 11672547. DOI: 10.1021/acscentsci.4c01428.


Understanding epistatic networks in the B1 β-lactamases through coevolutionary statistical modeling and deep mutational scanning.

Chen J, Bisardi M, Lee D, Cotogno S, Zamponi F, Weigt M Nat Commun. 2024; 15(1):8441.

PMID: 39349467 PMC: 11442494. DOI: 10.1038/s41467-024-52614-w.


Emergent time scales of epistasis in protein evolution.

Di Bari L, Bisardi M, Cotogno S, Weigt M, Zamponi F Proc Natl Acad Sci U S A. 2024; 121(40):e2406807121.

PMID: 39325427 PMC: 11459137. DOI: 10.1073/pnas.2406807121.


Machine learning in biological physics: From biomolecular prediction to design.

Martin J, Lequerica Mateos M, Onuchic J, Coluzza I, Morcos F Proc Natl Acad Sci U S A. 2024; 121(27):e2311807121.

PMID: 38913893 PMC: 11228481. DOI: 10.1073/pnas.2311807121.


References
1.
Firnberg E, Labonte J, Gray J, Ostermeier M . A comprehensive, high-resolution map of a gene's fitness landscape. Mol Biol Evol. 2014; 31(6):1581-92. PMC: 4032126. DOI: 10.1093/molbev/msu081. View

2.
Tubiana J, Cocco S, Monasson R . Learning protein constitutive motifs from sequence data. Elife. 2019; 8. PMC: 6436896. DOI: 10.7554/eLife.39397. View

3.
Barrat-Charlaix P, Muntoni A, Shimagaki K, Weigt M, Zamponi F . Sparse generative modeling via parameter reduction of Boltzmann machines: Application to protein-sequence families. Phys Rev E. 2021; 104(2-1):024407. DOI: 10.1103/PhysRevE.104.024407. View

4.
Burley S, Bhikadiya C, Bi C, Bittrich S, Chen L, Crichlow G . RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res. 2020; 49(D1):D437-D451. PMC: 7779003. DOI: 10.1093/nar/gkaa1038. View

5.
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O . Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596(7873):583-589. PMC: 8371605. DOI: 10.1038/s41586-021-03819-2. View