From Mechanistic Interpretability to Mechanistic Biology: Training, Evaluating, and Interpreting Sparse Autoencoders on Protein Language Models

Overview

Journal bioRxiv

Date 2025 Feb 20

PMID 39975216

Authors

Etowah Adams

Liam Bai

Minji Lee

Yiyang Yu

Mohammed AlQuraishi

Affiliations

Soon will be listed here.

Abstract

Protein language models (pLMs) are powerful predictors of protein structure and function, learning through unsupervised training on millions of protein sequences. pLMs are thought to capture common motifs in protein sequences, but the specifics of pLM features are not well understood. Identifying these features would not only shed light on how pLMs work, but potentially uncover novel protein biology-studying the model to study the biology. Motivated by this, we train sparse autoencoders (SAEs) on the residual stream of a pLM, ESM-2. By characterizing SAE features, we determine that pLMs use a combination of generic features and family-specific features to represent a protein. In addition, we demonstrate how known sequence determinants of properties such as thermostability and subcellular localization can be identified by linear probing of SAE features. For predictive features without known functional associations, we hypothesize their role in unknown mechanisms and provide visualization tools to aid their interpretation. Our study gives a better understanding of the limitations of pLMs, and demonstrates how SAE features can be used to help generate hypotheses for biological mechanisms. We release our code, model weights and feature visualizer.

References

Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W . Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023; 379(6637):1123-1130. DOI: 10.1126/science.ade2574. View

Suzek B, Huang H, McGarvey P, Mazumder R, Wu C . UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007; 23(10):1282-8. DOI: 10.1093/bioinformatics/btm098. View

Sievers F, Wilm A, Dineen D, Gibson T, Karplus K, Li W . Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011; 7:539. PMC: 3261699. DOI: 10.1038/msb.2011.75. View

Steinegger M, Soding J . MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017; 35(11):1026-1028. DOI: 10.1038/nbt.3988. View

Jarzab A, Kurzawa N, Hopf T, Moerch M, Zecha J, Leijten N . Meltome atlas-thermal proteome stability across the tree of life. Nat Methods. 2020; 17(5):495-503. DOI: 10.1038/s41592-020-0801-4. View

Rao R, Bhattacharya N, Thomas N, Duan Y, Chen X, Canny J . Evaluating Protein Transfer Learning with TAPE. Adv Neural Inf Process Syst. 2021; 32:9689-9701. PMC: 7774645. View

Detlefsen N, Hauberg S, Boomsma W . Learning meaningful representations of protein sequences. Nat Commun. 2022; 13(1):1914. PMC: 8993921. DOI: 10.1038/s41467-022-29443-w. View

Boeckmann B, Bairoch A, Apweiler R, Blatter M, Estreicher A, Gasteiger E . The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003; 31(1):365-70. PMC: 165542. DOI: 10.1093/nar/gkg095. View

Lu J, Wu T, Zhang B, Liu S, Song W, Qiao J . Types of nuclear localization signals and mechanisms of protein import into the nucleus. Cell Commun Signal. 2021; 19(1):60. PMC: 8140498. DOI: 10.1186/s12964-021-00741-y. View

10.

Ye J, McGinnis S, Madden T . BLAST: improvements for better sequence analysis. Nucleic Acids Res. 2006; 34(Web Server issue):W6-9. PMC: 1538791. DOI: 10.1093/nar/gkl164. View

11.

Farias S, Bonato M . Preferred amino acids and thermostability. Genet Mol Res. 2004; 2(4):383-93. View

12.

Almagro Armenteros J, Sonderby C, Sonderby S, Nielsen H, Winther O . DeepLoc: prediction of protein subcellular localization using deep learning. Bioinformatics. 2017; 33(21):3387-3395. DOI: 10.1093/bioinformatics/btx431. View

13.

Paysan-Lafosse T, Blum M, Chuguransky S, Grego T, Pinto B, Salazar G . InterPro in 2022. Nucleic Acids Res. 2022; 51(D1):D418-D427. PMC: 9825450. DOI: 10.1093/nar/gkac993. View

14.

Zhang Z, Wayment-Steele H, Brixi G, Wang H, Kern D, Ovchinnikov S . Protein language models learn evolutionary statistics of interacting sequence motifs. Proc Natl Acad Sci U S A. 2024; 121(45):e2406285121. PMC: 11551344. DOI: 10.1073/pnas.2406285121. View