» Articles » PMID: 34976319

SortPred: The First Machine Learning Based Predictor to Identify Bacterial Sortases and Their Classes Using Sequence-derived Information

Overview
Specialty Biotechnology
Date 2022 Jan 3
PMID 34976319
Citations 11
Authors
Affiliations
Soon will be listed here.
Abstract

Sortase enzymes are cysteine transpeptidases that embellish the surface of Gram-positive bacteria with various proteins thereby allowing these microorganisms to interact with their neighboring environment. It is known that several of their substrates can cause pathological implications, so researchers have focused on the development of sortase inhibitors. Currently, six different classes of sortases (A-F) are recognized. However, with the extensive application of bacterial genome sequencing projects, the number of potential sortases in the public databases has exploded, presenting considerable challenges in annotating these sequences. It is very laborious and time-consuming to characterize these sortase classes experimentally. Therefore, this study developed the first machine-learning-based two-layer predictor called SortPred, where the first layer predicts the sortase from the given sequence and the second layer predicts their class from the predicted sortase. To develop SortPred, we constructed an original benchmarking dataset and investigated 31 feature descriptors, primarily on five feature encoding algorithms. Afterward, each of these descriptors were trained using a random forest classifier and their robustness was evaluated with an independent dataset. Finally, we selected the final model independently for both layers depending on the performance consistency between cross-validation and independent evaluation. SortPred is expected to be an effective tool for identifying bacterial sortases, which in turn may aid in designing sortase inhibitors and exploring their functions. The SortPred webserver and a standalone version are freely accessible at: https://procarb.org/sortpred.

Citing Articles

BiGM-lncLoc: Bi-level Multi-Graph Meta-Learning for Predicting Cell-Specific Long Noncoding RNAs Subcellular Localization.

Deng X, Liu L Interdiscip Sci. 2024; .

PMID: 39724386 DOI: 10.1007/s12539-024-00679-y.


ac4C-AFL: A high-precision identification of human mRNA N4-acetylcytidine sites based on adaptive feature representation learning.

Pham N, Terrance A, Jeon Y, Rakkiyappan R, Manavalan B Mol Ther Nucleic Acids. 2024; 35(2):102192.

PMID: 38779332 PMC: 11108997. DOI: 10.1016/j.omtn.2024.102192.


Comparative genomic assessment of members of genus Tenacibaculum: an exploratory study.

Satyam R, Ahmad S, Raza K Mol Genet Genomics. 2023; 298(5):979-993.

PMID: 37225902 DOI: 10.1007/s00438-023-02031-3.


Genotyping of and machine learning models to predict the heat resistant phenotype based on genotype.

Noh E, Subramaniyam S, Cho S, Kim Y, Park C, Lee J Front Genet. 2023; 14:1151427.

PMID: 37065481 PMC: 10102348. DOI: 10.3389/fgene.2023.1151427.


Tissue engineering modalities in skeletal muscles: focus on angiogenesis and immunomodulation properties.

Namjoo A, Abrbekoh F, Saghati S, Amini H, Saadatlou M, Rahbarghazi R Stem Cell Res Ther. 2023; 14(1):90.

PMID: 37061717 PMC: 10105969. DOI: 10.1186/s13287-023-03310-x.


References
1.
Ha M, Yi S, Paek S . Design and Synthesis of Small Molecules as Potent Sortase A Inhibitors. Antibiotics (Basel). 2020; 9(10). PMC: 7602840. DOI: 10.3390/antibiotics9100706. View

2.
Dao F, Lv H, Zulfiqar H, Yang H, Su W, Gao H . A computational platform to identify origins of replication sites in eukaryotes. Brief Bioinform. 2020; 22(2):1940-1950. DOI: 10.1093/bib/bbaa017. View

3.
Wang J, Li J, Yang B, Xie R, Marquez-Lago T, Leier A . Bastion3: a two-layer ensemble predictor of type III secreted effectors. Bioinformatics. 2018; 35(12):2017-2028. PMC: 7963071. DOI: 10.1093/bioinformatics/bty914. View

4.
Dubchak I, Muchnik I, Mayor C, Dralyuk I, Kim S . Recognition of a protein fold in the context of the Structural Classification of Proteins (SCOP) classification. Proteins. 1999; 35(4):401-7. View

5.
Popp M, Antos J, Grotenbreg G, Spooner E, Ploegh H . Sortagging: a versatile method for protein labeling. Nat Chem Biol. 2007; 3(11):707-8. DOI: 10.1038/nchembio.2007.31. View