» Articles » PMID: 35651937

Identifying Functions of Proteins in Mice With Functional Embedding Features

Overview
Journal Front Genet
Date 2022 Jun 2
PMID 35651937
Authors
Affiliations
Soon will be listed here.
Abstract

In current biology, exploring the biological functions of proteins is important. Given the large number of proteins in some organisms, exploring their functions one by one through traditional experiments is impossible. Therefore, developing quick and reliable methods for identifying protein functions is necessary. Considerable accumulation of protein knowledge and recent developments on computer science provide an alternative way to complete this task, that is, designing computational methods. Several efforts have been made in this field. Most previous methods have adopted the protein sequence features or directly used the linkage from a protein-protein interaction (PPI) network. In this study, we proposed some novel multi-label classifiers, which adopted new embedding features to represent proteins. These features were derived from functional domains and a PPI network via word embedding and network embedding, respectively. The minimum redundancy maximum relevance method was used to assess the features, generating a feature list. Incremental feature selection, incorporating RAndom k-labELsets to construct multi-label classifiers, used such list to construct two optimum classifiers, corresponding to two key measurements: accuracy and exact match. These two classifiers had good performance, and they were superior to classifiers that used features extracted by traditional methods.

Citing Articles

Identification of key gene expression associated with quality of life after recovery from COVID-19.

Ren J, Gao Q, Zhou X, Chen L, Guo W, Feng K Med Biol Eng Comput. 2023; 62(4):1031-1048.

PMID: 38123886 DOI: 10.1007/s11517-023-02988-8.


Identification of Colon Immune Cell Marker Genes Using Machine Learning Methods.

Yang Y, Zhang Y, Ren J, Feng K, Li Z, Huang T Life (Basel). 2023; 13(9).

PMID: 37763280 PMC: 10532943. DOI: 10.3390/life13091876.


Identification of Gene Markers Associated with COVID-19 Severity and Recovery in Different Immune Cell Subtypes.

Ren J, Gao Q, Zhou X, Chen L, Guo W, Feng K Biology (Basel). 2023; 12(7).

PMID: 37508378 PMC: 10376631. DOI: 10.3390/biology12070947.


Immune responses of different COVID-19 vaccination strategies by analyzing single-cell RNA sequencing data from multiple tissues using machine learning methods.

Li H, Ma Q, Ren J, Guo W, Feng K, Li Z Front Genet. 2023; 14:1157305.

PMID: 37007947 PMC: 10065150. DOI: 10.3389/fgene.2023.1157305.

References
1.
Pan X, Li H, Zeng T, Li Z, Chen L, Huang T . Identification of Protein Subcellular Localization With Network and Functional Embeddings. Front Genet. 2021; 11:626500. PMC: 7873866. DOI: 10.3389/fgene.2020.626500. View

2.
Wu Z, Chen L . Similarity-Based Method with Multiple-Feature Sampling for Predicting Drug Side Effects. Comput Math Methods Med. 2022; 2022:9547317. PMC: 8993545. DOI: 10.1155/2022/9547317. View

3.
Camon E, Magrane M, Barrell D, Binns D, Fleischmann W, Kersey P . The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res. 2003; 13(4):662-72. PMC: 430163. DOI: 10.1101/gr.461403. View

4.
Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J . STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2014; 43(Database issue):D447-52. PMC: 4383874. DOI: 10.1093/nar/gku1003. View

5.
Shen H, Chou K . PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal Biochem. 2007; 373(2):386-8. DOI: 10.1016/j.ab.2007.10.012. View