» Articles » PMID: 23511543

Parametric Bayesian Priors and Better Choice of Negative Examples Improve Protein Function Prediction

Overview
Journal Bioinformatics
Specialty Biology
Date 2013 Mar 21
PMID 23511543
Citations 13
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Computational biologists have demonstrated the utility of using machine learning methods to predict protein function from an integration of multiple genome-wide data types. Yet, even the best performing function prediction algorithms rely on heuristics for important components of the algorithm, such as choosing negative examples (proteins without a given function) or determining key parameters. The improper choice of negative examples, in particular, can hamper the accuracy of protein function prediction.

Results: We present a novel approach for choosing negative examples, using a parameterizable Bayesian prior computed from all observed annotation data, which also generates priors used during function prediction. We incorporate this new method into the GeneMANIA function prediction algorithm and demonstrate improved accuracy of our algorithm over current top-performing function prediction methods on the yeast and mouse proteomes across all metrics tested.

Availability: Code and Data are available at: http://bonneaulab.bio.nyu.edu/funcprop.html

Citing Articles

Predicting protein functions using positive-unlabeled ranking with ontology-based priors.

Zhapa-Camacho F, Tang Z, Kulmanov M, Hoehndorf R Bioinformatics. 2024; 40(Suppl 1):i401-i409.

PMID: 38940168 PMC: 11211813. DOI: 10.1093/bioinformatics/btae237.


Computational Methods for Prediction of Human Protein-Phenotype Associations: A Review.

Liu L, Zhu S Phenomics. 2023; 1(4):171-185.

PMID: 36939789 PMC: 9590544. DOI: 10.1007/s43657-021-00019-w.


A Literature Review of Gene Function Prediction by Modeling Gene Ontology.

Zhao Y, Wang J, Chen J, Zhang X, Guo M, Yu G Front Genet. 2020; 11:400.

PMID: 32391061 PMC: 7193026. DOI: 10.3389/fgene.2020.00400.


Supervised learning is an accurate method for network-based gene classification.

Liu R, Mancuso C, Yannakopoulos A, Johnson K, Krishnan A Bioinformatics. 2020; 36(11):3457-3465.

PMID: 32129827 PMC: 7267831. DOI: 10.1093/bioinformatics/btaa150.


Evaluating the impact of topological protein features on the negative examples selection.

Boldi P, Frasca M, Malchiodi D BMC Bioinformatics. 2018; 19(Suppl 14):417.

PMID: 30453879 PMC: 6245585. DOI: 10.1186/s12859-018-2385-x.


References
1.
Smoot M, Ono K, Ruscheinski J, Wang P, Ideker T . Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2010; 27(3):431-2. PMC: 3031041. DOI: 10.1093/bioinformatics/btq675. View

2.
Pena-Castillo L, Tasan M, Myers C, Lee H, Joshi T, Zhang C . A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol. 2008; 9 Suppl 1:S2. PMC: 2447536. DOI: 10.1186/gb-2008-9-s1-s2. View

3.
Tasan M, Tian W, Hill D, Gibbons F, Blake J, Roth F . An en masse phenotype and function prediction system for Mus musculus. Genome Biol. 2008; 9 Suppl 1:S8. PMC: 2447542. DOI: 10.1186/gb-2008-9-s1-s8. View

4.
Zhang C, Joshi T, Lin G, Xu D . An integrated probabilistic approach for gene function prediction using multiple sources of high-throughput data. Int J Comput Biol Drug Des. 2010; 1(3):254-74. DOI: 10.1504/ijcbdd.2008.021418. View

5.
Drew K, Winters P, Butterfoss G, Berstis V, Uplinger K, Armstrong J . The Proteome Folding Project: proteome-scale prediction of structure and function. Genome Res. 2011; 21(11):1981-94. PMC: 3205581. DOI: 10.1101/gr.121475.111. View