» Articles » PMID: 15451510

Analyzing Protein Function on a Genomic Scale: the Importance of Gold-standard Positives and Negatives for Network Prediction

Overview
Specialty Microbiology
Date 2004 Sep 29
PMID 15451510
Citations 65
Authors
Affiliations
Soon will be listed here.
Abstract

The concept of 'protein function' is rather 'fuzzy' because it is often based on whimsical terms or contradictory nomenclature. This currently presents a challenge for functional genomics because precise definitions are essential for most computational approaches. Addressing this challenge, the notion of networks between biological entities (including molecular and genetic interaction networks as well as transcriptional regulatory relationships) potentially provides a unifying language suitable for the systematic description of protein function. Predicting the edges in protein networks requires reference sets of examples with known outcome (that is, 'gold standards'). Such reference sets should ideally include positive examples - as is now widely appreciated - but also, equally importantly, negative ones. Moreover, it is necessary to consider the expected relative occurrence of positives and negatives because this affects the misclassification rates of experiments and computational predictions. For instance, a reason why genome-wide, experimental protein-protein interaction networks have high inaccuracies is that the prior probability of finding interactions (positives) rather than non-interacting protein pairs (negatives) in unbiased screens is very small. These problems can be addressed by constructing well-defined sets of non-interacting proteins from subcellular localization data, which allows computing the probability of interactions based on evidence from multiple datasets.

Citing Articles

Prediction of influenza A virus-human protein-protein interactions using XGBoost with continuous and discontinuous amino acids information.

Li B, Li X, Li X, Wang L, Lu J, Wang J PeerJ. 2025; 13:e18863.

PMID: 39897484 PMC: 11787804. DOI: 10.7717/peerj.18863.


Computational approaches for the design of modulators targeting protein-protein interactions.

Rehman A, Khurshid B, Ali Y, Rasheed S, Wadood A, Ng H Expert Opin Drug Discov. 2023; 18(3):315-333.

PMID: 36715303 PMC: 10149343. DOI: 10.1080/17460441.2023.2171396.


Integration of probabilistic functional networks without an external Gold Standard.

James K, Alsobhe A, Cockell S, Wipat A, Pocock M BMC Bioinformatics. 2022; 23(1):302.

PMID: 35879662 PMC: 9316706. DOI: 10.1186/s12859-022-04834-4.


predictions of protein interactions between Zika virus and human host.

Pitta J, Dos Santos Vasconcelos C, da Luz Wallau G, Campos T, Rezende A PeerJ. 2021; 9:e11770.

PMID: 34513323 PMC: 8395582. DOI: 10.7717/peerj.11770.


Improved cytokine-receptor interaction prediction by exploiting the negative sample space.

Nath A, Leier A BMC Bioinformatics. 2020; 21(1):493.

PMID: 33129275 PMC: 7603689. DOI: 10.1186/s12859-020-03835-5.