» Articles » PMID: 34293799

Highly Accurate Protein Structure Prediction for the Human Proteome

Abstract

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.

Citing Articles

Receptor mechanism producing a sweet taste from plant aroma compounds.

Horie F, Sanematsu K, Yasumatsu K, Hirokawa T, Shigemura N, Yamashita A Sci Rep. 2025; 15(1):6795.

PMID: 40075099 PMC: 11904223. DOI: 10.1038/s41598-025-89711-9.


The proteotranscriptomic characterization of venom in the white seafan elucidates the evolution of Octocorallia arsenal.

Modica M, Leone S, Gerdol M, Greco S, Aurelle D, Oliverio M Open Biol. 2025; 15(3):250015.

PMID: 40068811 PMC: 11896702. DOI: 10.1098/rsob.250015.


Identification of potential drug targets for pelvic organ prolapse using a proteome-wide Mendelian randomization approach.

Xie Z, Feng Y, He Y, Lin Y, Wang X Sci Rep. 2025; 15(1):8291.

PMID: 40064973 PMC: 11893898. DOI: 10.1038/s41598-025-92800-4.


A bivalent spike-targeting nanobody with anti-sarbecovirus activity.

Swart I, Debski-Antoniak O, Zegar A, de Bouter T, Chatziandreou M, van den Berg M J Nanobiotechnology. 2025; 23(1):196.

PMID: 40059135 PMC: 11892322. DOI: 10.1186/s12951-025-03243-y.


Computational insights into the allosteric behavior of mini proinsulin driven by C peptide mobility.

Ayan E Sci Rep. 2025; 15(1):8065.

PMID: 40055446 PMC: 11889264. DOI: 10.1038/s41598-025-92799-8.


References
1.
Venter J, Adams M, Myers E, Li P, Mural R, Sutton G . The sequence of the human genome. Science. 2001; 291(5507):1304-51. DOI: 10.1126/science.1058040. View

2.
Slabinski L, Jaroszewski L, Rodrigues A, Rychlewski L, Wilson I, Lesley S . The challenge of protein structure determination--lessons from structural genomics. Protein Sci. 2007; 16(11):2472-82. PMC: 2211687. DOI: 10.1110/ps.073037907. View

3.
Elmlund D, Le S, Elmlund H . High-resolution cryo-EM: the nuts and bolts. Curr Opin Struct Biol. 2017; 46:1-6. DOI: 10.1016/j.sbi.2017.03.003. View

4.
Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, Baker D . Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci U S A. 2020; 117(3):1496-1503. PMC: 6983395. DOI: 10.1073/pnas.1914677117. View

5.
Greener J, Kandathil S, Jones D . Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints. Nat Commun. 2019; 10(1):3977. PMC: 6726615. DOI: 10.1038/s41467-019-11994-0. View