» Articles » PMID: 20431141

Molecular Function Prediction Using Neighborhood Features

Overview
Specialty Biology
Date 2010 May 1
PMID 20431141
Citations 23
Authors
Affiliations
Soon will be listed here.
Abstract

The recent advent of high-throughput methods has generated large amounts of gene interaction data. This has allowed the construction of genomewide networks. A significant number of genes in such networks remain uncharacterized and predicting the molecular function of these genes remains a major challenge. A number of existing techniques assume that genes with similar functions are topologically close in the network. Our hypothesis is that genes with similar functions observe similar annotation patterns in their neighborhood, regardless of the distance between them in the interaction network. We thus predict molecular functions of uncharacterized genes by comparing their functional neighborhoods to genes of known function. We propose a two-phase approach. First, we extract functional neighborhood features of a gene using Random Walks with Restarts. We then employ a KNN classifier to predict the function of uncharacterized genes based on the computed neighborhood features. We perform leave-one-out validation experiments on two S. cerevisiae interaction networks and show significant improvements over previous techniques. Our technique provides a natural control of the trade-off between accuracy and coverage of prediction. We further propose and evaluate prediction in sparse genomes by exploiting features from well-annotated genomes.

Citing Articles

Protein Function Prediction Based on PPI Networks: Network Reconstruction vs Edge Enrichment.

Zhou J, Xiong W, Wang Y, Guan J Front Genet. 2021; 12:758131.

PMID: 34970299 PMC: 8712557. DOI: 10.3389/fgene.2021.758131.


NPF:network propagation for protein function prediction.

Zhao B, Zhang Z, Jiang M, Hu S, Luo Y, Wang L BMC Bioinformatics. 2020; 21(1):355.

PMID: 32787776 PMC: 7430911. DOI: 10.1186/s12859-020-03663-7.


Evaluating the impact of topological protein features on the negative examples selection.

Boldi P, Frasca M, Malchiodi D BMC Bioinformatics. 2018; 19(Suppl 14):417.

PMID: 30453879 PMC: 6245585. DOI: 10.1186/s12859-018-2385-x.


A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks.

Frasca M, Grossi G, Gliozzo J, Mesiti M, Notaro M, Perlasca P BMC Bioinformatics. 2018; 19(Suppl 10):353.

PMID: 30367594 PMC: 6191976. DOI: 10.1186/s12859-018-2301-4.


Identifying novel genes and chemicals related to nasopharyngeal cancer in a heterogeneous network.

Li Z, An L, Li H, Wang S, Zhou Y, Yuan F Sci Rep. 2016; 6:25515.

PMID: 27149165 PMC: 4857740. DOI: 10.1038/srep25515.