» Articles » PMID: 37399411

More Than Just Pattern Recognition: Prediction of Uncommon Protein Structure Features by AI Methods

Overview
Specialty Science
Date 2023 Jul 3
PMID 37399411
Authors
Affiliations
Soon will be listed here.
Abstract

The CASP14 experiment demonstrated the extraordinary structure modeling capabilities of artificial intelligence (AI) methods. That result has ignited a fierce debate about what these methods are actually doing. One of the criticisms has been that the AI does not have any sense of the underlying physics but is merely performing pattern recognition. Here, we address that issue by analyzing the extent to which the methods identify rare structural motifs. The rationale underlying the approach is that a pattern recognition machine tends to choose the more frequently occurring motifs, whereas some sense of subtle energetic factors is required to choose infrequently occurring ones. To reduce the possibility of bias from related experimental structures and to minimize the effect of experimental errors, we examined only CASP14 target protein crystal structures determined to a resolution limit better than 2 Å, which lacked significant amino acid sequence homology to proteins of known structure. In those experimental structures and in the corresponding models, we track peptides, π-helices, 3-helices, and other small 3D motifs that occur in the PDB database at a frequency of lower than 1% of total amino acid residues. The best-performing AI method, AlphaFold2, captured these uncommon structural elements exquisitely well. All discrepancies appeared to be a consequence of crystal environment effects. We propose that the neural network learned a protein structure potential of mean force, enabling it to correctly identify situations where unusual structural features represent the lowest local free energy because of subtle influences from the atomic environment.

Citing Articles

AlphaFold 2, but not AlphaFold 3, predicts confident but unrealistic β-solenoid structures for repeat proteins.

Pratt O, Elliott L, Haon M, Mesdaghi S, Price R, Simpkin A Comput Struct Biotechnol J. 2025; 27:467-477.

PMID: 39911842 PMC: 11795689. DOI: 10.1016/j.csbj.2025.01.016.


δ-Conotoxin Structure Prediction and Analysis through Large-Scale Comparative and Deep Learning Modeling Approaches.

McCarthy S, Gonen S Adv Sci (Weinh). 2024; 11(35):e2404786.

PMID: 39033537 PMC: 11425241. DOI: 10.1002/advs.202404786.


Rosetta Energy Analysis of AlphaFold2 models: Point Mutations and Conformational Ensembles.

Stein R, Mchaourab H bioRxiv. 2023; .

PMID: 37732281 PMC: 10508732. DOI: 10.1101/2023.09.05.556364.

References
1.
Terwilliger T, Poon B, Afonine P, Schlicksup C, Croll T, Millan C . Improved AlphaFold modeling with implicit experimental information. Nat Methods. 2022; 19(11):1376-1382. PMC: 9636017. DOI: 10.1038/s41592-022-01645-6. View

2.
Watson J, Milner-White E . The conformations of polypeptide chains where the main-chain parts of successive residues are enantiomeric. Their occurrence in cation and anion-binding regions of proteins. J Mol Biol. 2002; 315(2):183-91. DOI: 10.1006/jmbi.2001.5228. View

3.
Jabs A, Weiss M, Hilgenfeld R . Non-proline cis peptide bonds in proteins. J Mol Biol. 1999; 286(1):291-304. DOI: 10.1006/jmbi.1998.2459. View

4.
Wuthrich K, Grathwohl C . A novel approach for studies of the molecular conformations in flexible polypeptides. FEBS Lett. 1974; 43(3):337-40. DOI: 10.1016/0014-5793(74)80674-5. View

5.
Kinch L, Schaeffer R, Kryshtafovych A, Grishin N . Target classification in the 14th round of the critical assessment of protein structure prediction (CASP14). Proteins. 2021; 89(12):1618-1632. PMC: 8616802. DOI: 10.1002/prot.26202. View