» Articles » PMID: 35767567

Discovering Molecular Features of Intrinsically Disordered Regions by Using Evolution for Contrastive Learning

Overview
Specialty Biology
Date 2022 Jun 29
PMID 35767567
Authors
Affiliations
Soon will be listed here.
Abstract

A major challenge to the characterization of intrinsically disordered regions (IDRs), which are widespread in the proteome, but relatively poorly understood, is the identification of molecular features that mediate functions of these regions, such as short motifs, amino acid repeats and physicochemical properties. Here, we introduce a proteome-scale feature discovery approach for IDRs. Our approach, which we call "reverse homology", exploits the principle that important functional features are conserved over evolution. We use this as a contrastive learning signal for deep learning: given a set of homologous IDRs, the neural network has to correctly choose a held-out homolog from another set of IDRs sampled randomly from the proteome. We pair reverse homology with a simple architecture and standard interpretation techniques, and show that the network learns conserved features of IDRs that can be interpreted as motifs, repeats, or bulk features like charge or amino acid propensities. We also show that our model can be used to produce visualizations of what residues and regions are most important to IDR function, generating hypotheses for uncharacterized IDRs. Our results suggest that feature discovery using unsupervised neural networks is a promising avenue to gain systematic insight into poorly understood protein sequences.

Citing Articles

Evaluation of predictions of disordered binding regions in the CAID2 experiment.

Zhang F, Kurgan L Comput Struct Biotechnol J. 2025; 27():78-88.

PMID: 39811792 PMC: 11732247. DOI: 10.1016/j.csbj.2024.12.009.


SHARK enables sensitive detection of evolutionary homologs and functional analogs in unalignable and disordered sequences.

Chow C, Ghosh S, Hadarovich A, Toth-Petroczy A Proc Natl Acad Sci U S A. 2024; 121(42):e2401622121.

PMID: 39383002 PMC: 11494347. DOI: 10.1073/pnas.2401622121.


Beyond monopole electrostatics in regulating conformations of intrinsically disordered proteins.

Phillips M, Muthukumar M, Ghosh K PNAS Nexus. 2024; 3(9):pgae367.

PMID: 39253398 PMC: 11382291. DOI: 10.1093/pnasnexus/pgae367.


PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions.

Halpin J, Keating A bioRxiv. 2024; .

PMID: 39091826 PMC: 11291154. DOI: 10.1101/2024.07.23.604860.


Direct prediction of intermolecular interactions driven by disordered regions.

Ginell G, Emenecker R, Lotthammer J, Usher E, Holehouse A bioRxiv. 2024; .

PMID: 38895487 PMC: 11185574. DOI: 10.1101/2024.06.03.597104.


References
1.
LeCun Y, Bengio Y, Hinton G . Deep learning. Nature. 2015; 521(7553):436-44. DOI: 10.1038/nature14539. View

2.
Erijman A, Kozlowski L, Sohrabi-Jahromi S, Fishburn J, Warfield L, Schreiber J . A High-Throughput Screen for Transcription Activation Domains Reveals Their Sequence Features and Permits Prediction by Deep Learning. Mol Cell. 2020; 78(5):890-902.e6. PMC: 7275923. DOI: 10.1016/j.molcel.2020.04.020. View

3.
Kelil A, Dubreuil B, Levy E, Michnick S . Exhaustive search of linear information encoding protein-peptide recognition. PLoS Comput Biol. 2017; 13(4):e1005499. PMC: 5417721. DOI: 10.1371/journal.pcbi.1005499. View

4.
Staller M, Holehouse A, Swain-Lenz D, Das R, Pappu R, Cohen B . A High-Throughput Mutational Scan of an Intrinsically Disordered Acidic Transcriptional Activation Domain. Cell Syst. 2018; 6(4):444-455.e6. PMC: 5920710. DOI: 10.1016/j.cels.2018.01.015. View

5.
Stollar E, Garcia B, Chong P, Rath A, Lin H, Forman-Kay J . Structural, functional, and bioinformatic studies demonstrate the crucial role of an extended peptide binding site for the SH3 domain of yeast Abp1p. J Biol Chem. 2009; 284(39):26918-27. PMC: 2785379. DOI: 10.1074/jbc.M109.028431. View