» Articles » PMID: 36751926

Impact of Phylogeny on Structural Contact Inference from Protein Sequence Data

Overview
Authors
Affiliations
Soon will be listed here.
Abstract

Local and global inference methods have been developed to infer structural contacts from multiple sequence alignments of homologous proteins. They rely on correlations in amino acid usage at contacting sites. Because homologous proteins share a common ancestry, their sequences also feature phylogenetic correlations, which can impair contact inference. We investigate this effect by generating controlled synthetic data from a minimal model where the importance of contacts and of phylogeny can be tuned. We demonstrate that global inference methods, specifically Potts models, are more resilient to phylogenetic correlations than local methods, based on covariance or mutual information. This holds whether or not phylogenetic corrections are used, and may explain the success of global methods. We analyse the roles of selection strength and of phylogenetic relatedness. We show that sites that mutate early in the phylogeny yield false positive contacts. We consider natural data and realistic synthetic data, and our findings generalize to these cases. Our results highlight the impact of phylogeny on contact prediction from protein sequences and illustrate the interplay between the rich structure of biological data and inference.

Citing Articles

Identification of coevolving positions by ancestral reconstruction.

Nelson M, Talavera D Commun Biol. 2025; 8(1):329.

PMID: 40021815 PMC: 11871020. DOI: 10.1038/s42003-025-07676-x.


Impact of phylogeny on the inference of functional sectors from protein sequence data.

Dietler N, Abbara A, Choudhury S, Bitbol A PLoS Comput Biol. 2024; 20(9):e1012091.

PMID: 39312591 PMC: 11449291. DOI: 10.1371/journal.pcbi.1012091.


Impact of phylogeny on structural contact inference from protein sequence data.

Dietler N, Lupo U, Bitbol A J R Soc Interface. 2023; 20(199):20220707.

PMID: 36751926 PMC: 9905998. DOI: 10.1098/rsif.2022.0707.


Generative power of a protein language model trained on multiple sequence alignments.

Sgarbossa D, Lupo U, Bitbol A Elife. 2023; 12.

PMID: 36734516 PMC: 10038667. DOI: 10.7554/eLife.79854.

References
1.
de la Paz J, Nartey C, Yuvaraj M, Morcos F . Epistatic contributions promote the unification of incompatible models of neutral molecular evolution. Proc Natl Acad Sci U S A. 2020; 117(11):5873-5882. PMC: 7084075. DOI: 10.1073/pnas.1913071117. View

2.
Hockenberry A, Wilke C . Phylogenetic weighting does little to improve the accuracy of evolutionary coupling analyses. Entropy (Basel). 2019; 21(10). PMC: 6818970. DOI: 10.3390/e21101000. View

3.
Cheng R, Nordesjo O, Hayes R, Levine H, Flores S, Onuchic J . Connecting the Sequence-Space of Bacterial Signaling Proteins to Phenotypes Using Coevolutionary Landscapes. Mol Biol Evol. 2016; 33(12):3054-3064. PMC: 5100047. DOI: 10.1093/molbev/msw188. View

4.
Colavin A, Atolia E, Bitbol A, Huang K . Extracting phylogenetic dimensions of coevolution reveals hidden functional signals. Sci Rep. 2022; 12(1):820. PMC: 8764114. DOI: 10.1038/s41598-021-04260-1. View

5.
Morcos F, Jana B, Hwa T, Onuchic J . Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci U S A. 2013; 110(51):20533-8. PMC: 3870752. DOI: 10.1073/pnas.1315625110. View