» Articles » PMID: 39779847

Site-saturation Mutagenesis of 500 Human Protein Domains

Overview
Journal Nature
Date 2025 Jan 8
PMID 39779847
Authors
Affiliations
Soon will be listed here.
Abstract

Missense variants that change the amino acid sequences of proteins cause one-third of human genetic diseases. Tens of millions of missense variants exist in the current human population, and the vast majority of these have unknown functional consequences. Here we present a large-scale experimental analysis of human missense variants across many different proteins. Using DNA synthesis and cellular selection experiments we quantify the effect of more than 500,000 variants on the abundance of more than 500 human protein domains. This dataset reveals that 60% of pathogenic missense variants reduce protein stability. The contribution of stability to protein fitness varies across proteins and diseases and is particularly important in recessive disorders. We combine stability measurements with protein language models to annotate functional sites across proteins. Mutational effects on stability are largely conserved in homologous domains, enabling accurate stability prediction across entire protein families using energy models. Our data demonstrate the feasibility of assaying human protein variants at scale and provides a large consistent reference dataset for clinical variant interpretation and training and benchmarking of computational methods.

Citing Articles

MaveDB 2024: a curated community database with over seven million variant effects from multiplexed functional assays.

Rubin A, Stone J, Bianchi A, Capodanno B, Da E, Dias M Genome Biol. 2025; 26(1):13.

PMID: 39838450 PMC: 11753097. DOI: 10.1186/s13059-025-03476-y.


Site-saturation mutagenesis of 500 human protein domains.

Beltran A, Jiang X, Shen Y, Lehner B Nature. 2025; 637(8047):885-894.

PMID: 39779847 PMC: 11754108. DOI: 10.1038/s41586-024-08370-4.

References
1.
Karbassi I, Maston G, Love A, DiVincenzo C, Braastad C, Elzinga C . A Standardized DNA Variant Scoring System for Pathogenicity Assessments in Mendelian Disorders. Hum Mutat. 2015; 37(1):127-34. PMC: 4737317. DOI: 10.1002/humu.22918. View

2.
Landrum M, Lee J, Riley G, Jang W, Rubinstein W, Church D . ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2013; 42(Database issue):D980-5. PMC: 3965032. DOI: 10.1093/nar/gkt1113. View

3.
Faure A, Lehner B . MoCHI: neural networks to fit interpretable models and quantify energies, energetic couplings, epistasis, and allostery from deep mutational scanning data. Genome Biol. 2024; 25(1):303. PMC: 11610129. DOI: 10.1186/s13059-024-03444-y. View

3.
Lek M, Karczewski K, Minikel E, Samocha K, Banks E, Fennell T . Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016; 536(7616):285-91. PMC: 5018207. DOI: 10.1038/nature19057. View

4.
Karczewski K, Francioli L, Tiao G, Cummings B, Alfoldi J, Wang Q . The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020; 581(7809):434-443. PMC: 7334197. DOI: 10.1038/s41586-020-2308-7. View