» Articles » PMID: 20507895

Fast Integration of Heterogeneous Data Sources for Predicting Gene Function with Limited Annotation

Overview
Journal Bioinformatics
Specialty Biology
Date 2010 May 29
PMID 20507895
Citations 50
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Many algorithms that integrate multiple functional association networks for predicting gene function construct a composite network as a weighted sum of the individual networks and then use the composite network to predict gene function. The weight assigned to an individual network represents the usefulness of that network in predicting a given gene function. However, because many categories of gene function have a small number of annotations, the process of assigning these network weights is prone to overfitting.

Results: Here, we address this problem by proposing a novel approach to combining multiple functional association networks. In particular, we present a method where network weights are simultaneously optimized on sets of related function categories. The method is simpler and faster than existing approaches. Further, we show that it produces composite networks with improved function prediction accuracy using five example species (yeast, mouse, fly, Esherichia coli and human).

Availability: Networks and code are available from: http://morrislab.med.utoronto.ca/sara/SW

Citing Articles

Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling.

Woicik A, Zhang M, Xu H, Mostafavi S, Wang S Bioinformatics. 2023; 39(39 Suppl 1):i504-i512.

PMID: 37387142 PMC: 10311345. DOI: 10.1093/bioinformatics/btad247.


Computational Methods for Prediction of Human Protein-Phenotype Associations: A Review.

Liu L, Zhu S Phenomics. 2023; 1(4):171-185.

PMID: 36939789 PMC: 9590544. DOI: 10.1007/s43657-021-00019-w.


CFAGO: cross-fusion of network and attributes based on attention mechanism for protein function prediction.

Wu Z, Guo M, Jin X, Chen J, Liu B Bioinformatics. 2023; 39(3).

PMID: 36883697 PMC: 10032634. DOI: 10.1093/bioinformatics/btad123.


Integration of probabilistic functional networks without an external Gold Standard.

James K, Alsobhe A, Cockell S, Wipat A, Pocock M BMC Bioinformatics. 2022; 23(1):302.

PMID: 35879662 PMC: 9316706. DOI: 10.1186/s12859-022-04834-4.


Machine learning: its challenges and opportunities in plant system biology.

Hesami M, Alizadeh M, Jones A, Torkamaneh D Appl Microbiol Biotechnol. 2022; 106(9-10):3507-3530.

PMID: 35575915 DOI: 10.1007/s00253-022-11963-6.


References
1.
Hu P, Janga S, Babu M, Diaz-Mejia J, Butland G, Yang W . Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol. 2009; 7(4):e96. PMC: 2672614. DOI: 10.1371/journal.pbio.1000096. View

2.
Nabieva E, Jim K, Agarwal A, Chazelle B, Singh M . Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics. 2005; 21 Suppl 1:i302-10. DOI: 10.1093/bioinformatics/bti1054. View

3.
Karaoz U, Murali T, Letovsky S, Zheng Y, Ding C, Cantor C . Whole-genome annotation by using evidence integration in functional-linkage networks. Proc Natl Acad Sci U S A. 2004; 101(9):2888-93. PMC: 365715. DOI: 10.1073/pnas.0307326101. View

4.
Vazquez A, Flammini A, Maritan A, Vespignani A . Global protein function prediction from protein-protein interaction networks. Nat Biotechnol. 2003; 21(6):697-700. DOI: 10.1038/nbt825. View

5.
Bairoch A . The ENZYME database in 2000. Nucleic Acids Res. 1999; 28(1):304-5. PMC: 102465. DOI: 10.1093/nar/28.1.304. View