» Articles » PMID: 37497772

Enhanced Identification of Membrane Transport Proteins: a Hybrid Approach Combining ProtBERT-BFD and Convolutional Neural Networks

Overview
Specialty Biology
Date 2023 Jul 27
PMID 37497772
Authors
Affiliations
Soon will be listed here.
Abstract

Transmembrane transport proteins (transporters) play a crucial role in the fundamental cellular processes of all organisms by facilitating the transport of hydrophilic substrates across hydrophobic membranes. Despite the availability of numerous membrane protein sequences, their structures and functions remain largely elusive. Recently, natural language processing (NLP) techniques have shown promise in the analysis of protein sequences. Bidirectional Encoder Representations from Transformers (BERT) is an NLP technique adapted for proteins to learn contextual embeddings of individual amino acids within a protein sequence. Our previous strategy, TooT-BERT-T, differentiated transporters from non-transporters by employing a logistic regression classifier with fine-tuned representations from ProtBERT-BFD. In this study, we expand upon this approach by utilizing representations from ProtBERT, ProtBERT-BFD, and MembraneBERT in combination with classical classifiers. Additionally, we introduce TooT-BERT-CNN-T, a novel method that fine-tunes ProtBERT-BFD and discriminates transporters using a Convolutional Neural Network (CNN). Our experimental results reveal that CNN surpasses traditional classifiers in discriminating transporters from non-transporters, achieving an MCC of 0.89 and an accuracy of 95.1 % on the independent test set. This represents an improvement of 0.03 and 1.11 percentage points compared to TooT-BERT-T, respectively.

Citing Articles

Ion channel classification through machine learning and protein language model embeddings.

Ghazikhani H, Butler G J Integr Bioinform. 2024; 21(4).

PMID: 39572876 PMC: 11698620. DOI: 10.1515/jib-2023-0047.

References
1.
Sankari E, Manimegalai D . Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets. J Theor Biol. 2017; 435:208-217. DOI: 10.1016/j.jtbi.2017.09.018. View

2.
Alballa M, Butler G . Integrative approach for detecting membrane proteins. BMC Bioinformatics. 2020; 21(Suppl 19):575. PMC: 7751106. DOI: 10.1186/s12859-020-03891-x. View

3.
Liou Y, Vasylenko T, Yeh C, Lin W, Chiu S, Charoenkwan P . SCMMTP: identifying and characterizing membrane transport proteins using propensity scores of dipeptides. BMC Genomics. 2015; 16 Suppl 12:S6. PMC: 4682407. DOI: 10.1186/1471-2164-16-S12-S6. View

4.
Mishra N, Chang J, Zhao P . Prediction of membrane transport proteins and their substrate specificities using primary sequence information. PLoS One. 2014; 9(6):e100278. PMC: 4072671. DOI: 10.1371/journal.pone.0100278. View

5.
Li L, Li J, Xiao W, Li Y, Qin Y, Zhou S . Prediction the Substrate Specificities of Membrane Transport Proteins Based on Support Vector Machine and Hybrid Features. IEEE/ACM Trans Comput Biol Bioinform. 2015; 13(5):947-953. DOI: 10.1109/TCBB.2015.2495140. View