» Articles » PMID: 30169569

Improving Protein Function Prediction Using Protein Sequence and GO-term Similarities

Overview
Journal Bioinformatics
Specialty Biology
Date 2018 Sep 1
PMID 30169569
Citations 10
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Most automatic functional annotation methods assign Gene Ontology (GO) terms to proteins based on annotations of highly similar proteins. We advocate that proteins that are less similar are still informative. Also, despite their simplicity and structure, GO terms seem to be hard for computers to learn, in particular the Biological Process ontology, which has the most terms (>29 000). We propose to use Label-Space Dimensionality Reduction (LSDR) techniques to exploit the redundancy of GO terms and transform them into a more compact latent representation that is easier to predict.

Results: We compare proteins using a sequence similarity profile (SSP) to a set of annotated training proteins. We introduce two new LSDR methods, one based on the structure of the GO, and one based on semantic similarity of terms. We show that these LSDR methods, as well as three existing ones, improve the Critical Assessment of Functional Annotation performance of several function prediction algorithms. Cross-validation experiments on Arabidopsis thaliana proteins pinpoint the superiority of our GO-aware LSDR over generic LSDR. Our experiments on A.thaliana proteins show that the SSP representation in combination with a kNN classifier outperforms state-of-the-art and baseline methods in terms of cross-validated F-measure.

Availability And Implementation: Source code for the experiments is available at https://github.com/stamakro/SSP-LSDR.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

RecGOBD: accurate recognition of gene ontology related brain development protein functions through multi-feature fusion and attention mechanisms.

Xia Z, Ma S, Li J, Guo Y, Jiang L, Tang J Bioinform Adv. 2024; 4(1):vbae163.

PMID: 39678209 PMC: 11639192. DOI: 10.1093/bioadv/vbae163.


CFAGO: cross-fusion of network and attributes based on attention mechanism for protein function prediction.

Wu Z, Guo M, Jin X, Chen J, Liu B Bioinformatics. 2023; 39(3).

PMID: 36883697 PMC: 10032634. DOI: 10.1093/bioinformatics/btad123.


Rule-Based Pruning and In Silico Identification of Essential Proteins in Yeast PPIN.

Banik A, Podder S, Saha S, Chatterjee P, Halder A, Nasipuri M Cells. 2022; 11(17).

PMID: 36078056 PMC: 9454873. DOI: 10.3390/cells11172648.


Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field.

Villalobos-Alva J, Ochoa-Toledo L, Villalobos-Alva M, Aliseda A, Perez-Escamirosa F, Altamirano-Bustamante N Front Bioeng Biotechnol. 2022; 10:788300.

PMID: 35875501 PMC: 9301016. DOI: 10.3389/fbioe.2022.788300.


On the influence of several factors on pathway enrichment analysis.

Mubeen S, Tom Kodamullil A, Hofmann-Apitius M, Domingo-Fernandez D Brief Bioinform. 2022; 23(3).

PMID: 35453140 PMC: 9116215. DOI: 10.1093/bib/bbac143.


References
1.
Youngs N, Penfold-Brown D, Drew K, Shasha D, Bonneau R . Parametric Bayesian priors and better choice of negative examples improve protein function prediction. Bioinformatics. 2013; 29(9):1190-8. PMC: 3634187. DOI: 10.1093/bioinformatics/btt110. View

2.
Yu G, Luo W, Fu G, Wang J . Interspecies gene function prediction using semantic similarity. BMC Syst Biol. 2017; 10(Suppl 4):121. PMC: 5260010. DOI: 10.1186/s12918-016-0361-5. View

3.
Lan L, Djuric N, Guo Y, Vucetic S . MS-kNN: protein function prediction by integrating multiple data sources. BMC Bioinformatics. 2013; 14 Suppl 3:S8. PMC: 3584913. DOI: 10.1186/1471-2105-14-S3-S8. View

4.
Kahanda I, Funk C, Ullah F, Verspoor K, Ben-Hur A . A close look at protein function prediction evaluation protocols. Gigascience. 2015; 4:41. PMC: 4570743. DOI: 10.1186/s13742-015-0082-5. View

5.
Zitnik M, Zupan B . Data Fusion by Matrix Factorization. IEEE Trans Pattern Anal Mach Intell. 2015; 37(1):41-53. DOI: 10.1109/TPAMI.2014.2343973. View