» Articles » PMID: 36388586

Natural Language Processing Approach to Model the Secretion Signal of Type III Effectors

Overview
Journal Front Plant Sci
Date 2022 Nov 17
PMID 36388586
Authors
Affiliations
Soon will be listed here.
Abstract

Type III effectors are proteins injected by Gram-negative bacteria into eukaryotic hosts. In many plant and animal pathogens, these effectors manipulate host cellular processes to the benefit of the bacteria. Type III effectors are secreted by a type III secretion system that must "classify" each bacterial protein into one of two categories, either the protein should be translocated or not. It was previously shown that type III effectors have a secretion signal within their N-terminus, however, despite numerous efforts, the exact biochemical identity of this secretion signal is generally unknown. Computational characterization of the secretion signal is important for the identification of novel effectors and for better understanding the molecular translocation mechanism. In this work we developed novel machine-learning algorithms for characterizing the secretion signal in both plant and animal pathogens. Specifically, we represented each protein as a vector in high-dimensional space using Facebook's protein language model. Classification algorithms were next used to separate effectors from non-effector proteins. We subsequently curated a benchmark dataset of hundreds of effectors and thousands of non-effector proteins. We showed that on this curated dataset, our novel approach yielded substantially better classification accuracy compared to previously developed methodologies. We have also tested the hypothesis that plant and animal pathogen effectors are characterized by different secretion signals. Finally, we integrated the novel approach in Effectidor, a web-server for predicting type III effector proteins, leading to a more accurate classification of effectors from non-effectors.

Citing Articles

Contrastive-learning of language embedding and biological features for cross modality encoding and effector prediction.

Peng Y, Wu J, Sun Y, Zhang Y, Wang Q, Shao S Nat Commun. 2025; 16(1):1299.

PMID: 39900608 PMC: 11791096. DOI: 10.1038/s41467-025-56526-1.


Effect of tokenization on transformers for biological sequences.

Dotan E, Jaschek G, Pupko T, Belinkov Y Bioinformatics. 2024; 40(4).

PMID: 38608190 PMC: 11055402. DOI: 10.1093/bioinformatics/btae196.


T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors.

Hu Y, Wang Y, Hu X, Chao H, Li S, Ni Q Comput Struct Biotechnol J. 2024; 23:801-812.

PMID: 38328004 PMC: 10847861. DOI: 10.1016/j.csbj.2024.01.015.


Complete genome sequence of an Israeli isolate of pv. pelargonii strain 305 and novel type III effectors identified in .

Wagner N, Ben-Meir D, Teper D, Pupko T Front Plant Sci. 2023; 14:1155341.

PMID: 37332699 PMC: 10275491. DOI: 10.3389/fpls.2023.1155341.

References
1.
Yu L, Liu F, Li Y, Luo J, Jing R . DeepT3_4: A Hybrid Deep Neural Network Model for the Distinction Between Bacterial Type III and IV Secreted Effectors. Front Microbiol. 2021; 12:605782. PMC: 7858263. DOI: 10.3389/fmicb.2021.605782. View

2.
Wagner N, Avram O, Gold-Binshtok D, Zerah B, Teper D, Pupko T . Effectidor: an automated machine-learning-based web server for the prediction of type-III secretion system effectors. Bioinformatics. 2022; 38(8):2341-2343. DOI: 10.1093/bioinformatics/btac087. View

3.
Lifshitz Z, Burstein D, Schwartz K, Shuman H, Pupko T, Segal G . Identification of novel Coxiella burnetii Icm/Dot effectors and genetic analysis of their involvement in modulating a mitogen-activated protein kinase pathway. Infect Immun. 2014; 82(9):3740-52. PMC: 4187803. DOI: 10.1128/IAI.01729-14. View

4.
Nissan G, Gershovits M, Morozov M, Chalupowicz L, Sessa G, Manulis-Sasson S . Revealing the inventory of type III effectors in Pantoea agglomerans gall-forming pathovars using draft genome sequences and a machine-learning approach. Mol Plant Pathol. 2016; 19(2):381-392. PMC: 6638007. DOI: 10.1111/mpp.12528. View

5.
Dong X, Zhang Y, Zhang Z . Using weakly conserved motifs hidden in secretion signals to identify type-III effectors from bacterial pathogen genomes. PLoS One. 2013; 8(2):e56632. PMC: 3577856. DOI: 10.1371/journal.pone.0056632. View