» Articles » PMID: 33914130

Predicting Gene Phenotype by Multi-label Multi-class Model Based on Essential Functional Features

Overview
Specialty Genetics
Date 2021 Apr 29
PMID 33914130
Citations 7
Authors
Affiliations
Soon will be listed here.
Abstract

Phenotype is one of the most significant concepts in genetics, which is used to describe all the characteristics of a research object that can be observed. Considering that phenotype reflects the integrated features of genotype and environment factors, it is hard to define phenotype characteristics, even difficult to predict unknown phenotypes. Restricted by current biological techniques, it is still quite expensive and time-consuming to obtain sufficient structural information of large-scale phenotype-associated genes/proteins. Various bioinformatics methods have been presented to solve such problem, and researchers have confirmed the efficacy and prediction accuracy of functional network-based prediction. But general functional descriptions have highly complicated inner structures for phenotype prediction. To further address this issue and improve the efficacy of phenotype prediction on more than ten kinds of phenotypes, we first extract functional enrichment features from GO and KEGG, and then use node2vec to learn functional embedding features of genes from a gene-gene network. All these features are analyzed by some feature selection methods (Boruta, minimum redundancy maximum relevance) to generate a feature list. Such list is fed into the incremental feature selection, incorporating some multi-label classifiers built by RAkEL and some classic base classifiers, to build an optimum multi-label multi-class classification model for phenotype prediction. According to recent researches, our method has indeed identified many literature-supported genes/proteins and their associated phenotypes, and even some candidate genes with re-assigned new phenotypes, which provide a new computational tool for the accurate and effective phenotypic prediction.

Citing Articles

Identification of Smoking-Associated Transcriptome Aberration in Blood with Machine Learning Methods.

Huang F, Ma Q, Ren J, Li J, Wang F, Huang T Biomed Res Int. 2023; 2023:5333361.

PMID: 36644165 PMC: 9833906. DOI: 10.1155/2023/5333361.


Computational systems biology in disease modeling and control, review and perspectives.

Yue R, Dutta A NPJ Syst Biol Appl. 2022; 8(1):37.

PMID: 36192551 PMC: 9528884. DOI: 10.1038/s41540-022-00247-4.


Identification of protein-protein interaction associated functions based on gene ontology and KEGG pathway.

Yang L, Zhang Y, Huang F, Li Z, Huang T, Cai Y Front Genet. 2022; 13:1011659.

PMID: 36171880 PMC: 9511048. DOI: 10.3389/fgene.2022.1011659.


PseAraUbi: predicting arabidopsis ubiquitination sites by incorporating the physico-chemical and structural features.

Wang W, Zhang Y, Liu D, Zhang H, Wang X, Zhou Y Plant Mol Biol. 2022; 110(1-2):81-92.

PMID: 35773617 DOI: 10.1007/s11103-022-01288-3.


A New Risk Score Based on Eight Hepatocellular Carcinoma- Immune Gene Expression Can Predict the Prognosis of the Patients.

Ye D, Liu Y, Li G, Sun B, Peng J, Xu Q Front Oncol. 2021; 11:766072.

PMID: 34868990 PMC: 8639602. DOI: 10.3389/fonc.2021.766072.


References
1.
Alone P, Cao C, Dever T . Translation initiation factor 2gamma mutant alters start codon selection independent of Met-tRNA binding. Mol Cell Biol. 2008; 28(22):6877-88. PMC: 2573310. DOI: 10.1128/MCB.01147-08. View

2.
Carmona-Saez P, Chagoyen M, Tirado F, Carazo J, Pascual-Montano A . GENECODIS: a web-based tool for finding significant concurrent annotations in gene lists. Genome Biol. 2007; 8(1):R3. PMC: 1839127. DOI: 10.1186/gb-2007-8-1-r3. View

3.
Caro L, Tettelin H, Vossen J, Ram A, van den Ende H, Klis F . In silicio identification of glycosyl-phosphatidylinositol-anchored plasma-membrane and cell wall proteins of Saccharomyces cerevisiae. Yeast. 1998; 13(15):1477-89. DOI: 10.1002/(SICI)1097-0061(199712)13:15<1477::AID-YEA184>3.0.CO;2-L. View

4.
Chen L, Zhang Y, Lu G, Huang T, Cai Y . Analysis of cancer-related lncRNAs using gene ontology and KEGG pathways. Artif Intell Med. 2017; 76:27-36. DOI: 10.1016/j.artmed.2017.02.001. View

5.
Chen L, Pan X, Zhang Y, Liu M, Huang T, Cai Y . Classification of Widely and Rarely Expressed Genes with Recurrent Neural Network. Comput Struct Biotechnol J. 2019; 17:49-60. PMC: 6307323. DOI: 10.1016/j.csbj.2018.12.002. View