» Articles » PMID: 38725156

TransPTM: a Transformer-based Model for Non-histone Acetylation Site Prediction

Overview
Journal Brief Bioinform
Specialty Biology
Date 2024 May 10
PMID 38725156
Authors
Affiliations
Soon will be listed here.
Abstract

Protein acetylation is one of the extensively studied post-translational modifications (PTMs) due to its significant roles across a myriad of biological processes. Although many computational tools for acetylation site identification have been developed, there is a lack of benchmark dataset and bespoke predictors for non-histone acetylation site prediction. To address these problems, we have contributed to both dataset creation and predictor benchmark in this study. First, we construct a non-histone acetylation site benchmark dataset, namely NHAC, which includes 11 subsets according to the sequence length ranging from 11 to 61 amino acids. There are totally 886 positive samples and 4707 negative samples for each sequence length. Secondly, we propose TransPTM, a transformer-based neural network model for non-histone acetylation site predication. During the data representation phase, per-residue contextualized embeddings are extracted using ProtT5 (an existing pre-trained protein language model). This is followed by the implementation of a graph neural network framework, which consists of three TransformerConv layers for feature extraction and a multilayer perceptron module for classification. The benchmark results reflect that TransPTM has the competitive performance for non-histone acetylation site prediction over three state-of-the-art tools. It improves our comprehension on the PTM mechanism and provides a theoretical basis for developing drug targets for diseases. Moreover, the created PTM datasets fills the gap in non-histone acetylation site datasets and is beneficial to the related communities. The related source code and data utilized by TransPTM are accessible at https://www.github.com/TransPTM/TransPTM.

Citing Articles

Artificial Intelligence Transforming Post-Translational Modification Research.

Kim D, Yin T, Zhang T, Im A, Cort J, Rozum J Bioengineering (Basel). 2025; 12(1).

PMID: 39851300 PMC: 11762806. DOI: 10.3390/bioengineering12010026.


Current computational tools for protein lysine acylation site prediction.

Qin Z, Ren H, Zhao P, Wang K, Liu H, Miao C Brief Bioinform. 2024; 25(6).

PMID: 39316944 PMC: 11421846. DOI: 10.1093/bib/bbae469.


PTM-Mamba: A PTM-Aware Protein Language Model with Bidirectional Gated Mamba Blocks.

Peng Z, Schussheim B, Chatterjee P bioRxiv. 2024; .

PMID: 38464112 PMC: 10925343. DOI: 10.1101/2024.02.28.581983.

References
1.
Krassowski M, Paczkowska M, Cullion K, Huang T, Dzneladze I, Ouellette B . ActiveDriverDB: human disease mutations and genome variation in post-translational modification sites of proteins. Nucleic Acids Res. 2017; 46(D1):D901-D910. PMC: 5753267. DOI: 10.1093/nar/gkx973. View

2.
Bannister A, Miska E, Gorlich D, Kouzarides T . Acetylation of importin-alpha nuclear import factors by CBP/p300. Curr Biol. 2000; 10(8):467-70. DOI: 10.1016/s0960-9822(00)00445-0. View

3.
Teufel F, Almagro Armenteros J, Johansen A, Gislason M, Pihl S, Tsirigos K . SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat Biotechnol. 2022; 40(7):1023-1025. PMC: 9287161. DOI: 10.1038/s41587-021-01156-3. View

4.
Jensen O . Interpreting the protein language using proteomics. Nat Rev Mol Cell Biol. 2006; 7(6):391-403. DOI: 10.1038/nrm1939. View

5.
Narita T, Weinert B, Choudhary C . Functions and mechanisms of non-histone protein acetylation. Nat Rev Mol Cell Biol. 2018; 20(3):156-174. DOI: 10.1038/s41580-018-0081-3. View