» Articles » PMID: 39021511

CELL-E 2: Translating Proteins to Pictures and Back with a Bidirectional Text-to-Image Transformer

Overview
Date 2024 Jul 18
PMID 39021511
Authors
Affiliations
Soon will be listed here.
Abstract

We present CELL-E 2, a novel bidirectional transformer that can generate images depicting protein subcellular localization from the amino acid sequences (and ). Protein localization is a challenging problem that requires integrating sequence and image information, which most existing methods ignore. CELL-E 2 extends the work of CELL-E, not only capturing the spatial complexity of protein localization and produce probability estimates of localization atop a nucleus image, but also being able to generate sequences from images, enabling protein design. We train and finetune CELL-E 2 on two large-scale datasets of human proteins. We also demonstrate how to use CELL-E 2 to create hundreds of novel nuclear localization signals (NLS). Results and interactive demos are featured at https://bohuanglab.github.io/CELL-E_2/.

References
1.
. IUPAC-IUB Commission on Biochemical Nomenclature. A one-letter notation for amino acid sequences. Tentative rules. J Biol Chem. 1968; 243(13):3557-9. View

2.
Lange A, Mills R, Lange C, Stewart M, Devine S, Corbett A . Classical nuclear localization signals: definition, function, and interaction with importin alpha. J Biol Chem. 2006; 282(8):5101-5. PMC: 4502416. DOI: 10.1074/jbc.R600026200. View

3.
Bradley K, Bowl M, Williams S, Ahmad B, Partridge C, Patmanidi A . Parafibromin is a nuclear protein with a functional monopartite nuclear localization signal. Oncogene. 2006; 26(8):1213-21. DOI: 10.1038/sj.onc.1209893. View

4.
Thul P, Lindskog C . The human protein atlas: A spatial map of the human proteome. Protein Sci. 2017; 27(1):233-244. PMC: 5734309. DOI: 10.1002/pro.3307. View

5.
Schnell U, Dijk F, Sjollema K, Giepmans B . Immunolabeling artifacts and the need for live-cell imaging. Nat Methods. 2012; 9(2):152-8. DOI: 10.1038/nmeth.1855. View