» Articles » PMID: 34671418

IMPT-FDNPL: Identification of Membrane Protein Types with Functional Domains and a Natural Language Processing Approach

Overview
Publisher Hindawi
Date 2021 Oct 21
PMID 34671418
Citations 29
Authors
Affiliations
Soon will be listed here.
Abstract

Membrane protein is an important kind of proteins. It plays essential roles in several cellular processes. Based on the intramolecular arrangements and positions in a cell, membrane proteins can be divided into several types. It is reported that the types of a membrane protein are highly related to its functions. Determination of membrane protein types is a hot topic in recent years. A plenty of computational methods have been proposed so far. Some of them used functional domain information to encode proteins. However, this procedure was still crude. In this study, we designed a novel feature extraction scheme to obtain informative features of proteins from their functional domain information. Such scheme termed domains as words and proteins, represented by its domains, as sentences. The natural language processing approach, word2vector, was applied to access the features of domains, which were further refined to protein features. Based on these features, RAndom k-labELsets with random forest as the base classifier was employed to build the multilabel classifier, namely, iMPT-FDNPL. The tenfold cross-validation results indicated the good performance of such classifier. Furthermore, such classifier was superior to other classifiers based on features derived from functional domains via one-hot scheme or derived from other properties of proteins, suggesting the effectiveness of protein features generated by the proposed scheme.

Citing Articles

PredictEFC: a fast and efficient multi-label classifier for predicting enzyme family classes.

Chen L, Zhang C, Xu J BMC Bioinformatics. 2024; 25(1):50.

PMID: 38291384 PMC: 10829269. DOI: 10.1186/s12859-024-05665-1.


Spinal Cord Injury Affects Gene Expression of Transmembrane Proteins in Tissue and Release of Extracellular Vesicle in Blood: In Silico and Analysis.

Mirzaalikhan Y, Eslami N, Izadi A, Shekari F, Kiani S Cell J. 2023; 25(11):772-782.

PMID: 38071409 PMC: 10711288. DOI: 10.22074/cellj.2023.2004115.1320.


Immune responses of different COVID-19 vaccination strategies by analyzing single-cell RNA sequencing data from multiple tissues using machine learning methods.

Li H, Ma Q, Ren J, Guo W, Feng K, Li Z Front Genet. 2023; 14:1157305.

PMID: 37007947 PMC: 10065150. DOI: 10.3389/fgene.2023.1157305.


Machine learning in computational modelling of membrane protein sequences and structures: From methodologies to applications.

Sun J, Kulandaisamy A, Liu J, Hu K, Gromiha M, Zhang Y Comput Struct Biotechnol J. 2023; 21:1205-1226.

PMID: 36817959 PMC: 9932300. DOI: 10.1016/j.csbj.2023.01.036.


Identification of Transcriptome Biomarkers for Severe COVID-19 with Machine Learning Methods.

Li X, Zhou X, Ding S, Chen L, Feng K, Li H Biomolecules. 2022; 12(12).

PMID: 36551164 PMC: 9775121. DOI: 10.3390/biom12121735.


References
1.
Pan X, Li H, Zeng T, Li Z, Chen L, Huang T . Identification of Protein Subcellular Localization With Network and Functional Embeddings. Front Genet. 2021; 11:626500. PMC: 7873866. DOI: 10.3389/fgene.2020.626500. View

2.
Huang G, Zhang Y, Chen L, Zhang N, Huang T, Cai Y . Prediction of multi-type membrane proteins in human by an integrated approach. PLoS One. 2014; 9(3):e93553. PMC: 3968155. DOI: 10.1371/journal.pone.0093553. View

3.
Marques Y, de Paiva Oliveira A, Vasconcelos A, Ribeiro Cerqueira F . Mirnacle: machine learning with SMOTE and random forest for improving selectivity in pre-miRNA ab initio prediction. BMC Bioinformatics. 2017; 17(Suppl 18):474. PMC: 5249014. DOI: 10.1186/s12859-016-1343-8. View

4.
Pan X, Chen L, Liu M, Niu Z, Huang T, Cai Y . Identifying Protein Subcellular Locations With Embeddings-Based node2loc. IEEE/ACM Trans Comput Biol Bioinform. 2021; 19(2):666-675. DOI: 10.1109/TCBB.2021.3080386. View

5.
Sankari E, Manimegalai D . Predicting membrane protein types by incorporating a novel feature set into Chou's general PseAAC. J Theor Biol. 2018; 455:319-328. DOI: 10.1016/j.jtbi.2018.07.032. View