» Articles » PMID: 39915710

Direct Coupling Analysis and the Attention Mechanism

Overview
Publisher Biomed Central
Date 2025 Feb 6
PMID 39915710
Authors
Affiliations
Soon will be listed here.
Abstract

Proteins are involved in nearly all cellular functions, encompassing roles in transport, signaling, enzymatic activity, and more. Their functionalities crucially depend on their complex three-dimensional arrangement. For this reason, being able to predict their structure from the amino acid sequence has been and still is a phenomenal computational challenge that the introduction of AlphaFold solved with unprecedented accuracy. However, the inherent complexity of AlphaFold's architectures makes it challenging to understand the rules that ultimately shape the protein's predicted structure. This study investigates a single-layer unsupervised model based on the attention mechanism. More precisely, we explore a Direct Coupling Analysis (DCA) method that mimics the attention mechanism of several popular Transformer architectures, such as AlphaFold itself. The model's parameters, notably fewer than those in standard DCA-based algorithms, can be directly used for extracting structural determinants such as the contact map of the protein family under study. Additionally, the functional form of the energy function of the model enables us to deploy a multi-family learning strategy, allowing us to effectively integrate information across multiple protein families, whereas standard DCA algorithms are typically limited to single protein families. Finally, we implemented a generative version of the model using an autoregressive architecture, capable of efficiently generating new proteins in silico.

References
1.
Gueudre T, Baldassi C, Zamparo M, Weigt M, Pagnani A . Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis. Proc Natl Acad Sci U S A. 2016; 113(43):12186-12191. PMC: 5087065. DOI: 10.1073/pnas.1607570113. View

2.
Morcos F, Pagnani A, Lunt B, Bertolino A, Marks D, Sander C . Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A. 2011; 108(49):E1293-301. PMC: 3241805. DOI: 10.1073/pnas.1111471108. View

3.
Fernandez-de-Cossio-Diaz J, Uguzzoni G, Pagnani A . Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan. Mol Biol Evol. 2020; 38(1):318-328. PMC: 7783173. DOI: 10.1093/molbev/msaa204. View

4.
Figliuzzi M, Barrat-Charlaix P, Weigt M . How Pairwise Coevolutionary Models Capture the Collective Residue Variability in Proteins?. Mol Biol Evol. 2018; 35(4):1018-1027. DOI: 10.1093/molbev/msy007. View

5.
Heinzinger M, Elnaggar A, Wang Y, Dallago C, Nechaev D, Matthes F . Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics. 2019; 20(1):723. PMC: 6918593. DOI: 10.1186/s12859-019-3220-8. View