» Articles » PMID: 24932003

Inferring Gene Ontologies from Pairwise Similarity Data

Overview
Journal Bioinformatics
Specialty Biology
Date 2014 Jun 17
PMID 24932003
Citations 44
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: While the manually curated Gene Ontology (GO) is widely used, inferring a GO directly from -omics data is a compelling new problem. Recognizing that ontologies are a directed acyclic graph (DAG) of terms and hierarchical relations, algorithms are needed that: analyze a full matrix of gene-gene pairwise similarities from -omics data; infer true hierarchical structure in these data rather than enforcing hierarchy as a computational artifact; and respect biological pleiotropy, by which a term in the hierarchy can relate to multiple higher level terms. Methods addressing these requirements are just beginning to emerge-none has been evaluated for GO inference.

Methods: We consider two algorithms [Clique Extracted Ontology (CliXO), LocalFitness] that uniquely satisfy these requirements, compared with methods including standard clustering. CliXO is a new approach that finds maximal cliques in a network induced by progressive thresholding of a similarity matrix. We evaluate each method's ability to reconstruct the GO biological process ontology from a similarity matrix based on (a) semantic similarities for GO itself or (b) three -omics datasets for yeast.

Results: For task (a) using semantic similarity, CliXO accurately reconstructs GO (>99% precision, recall) and outperforms other approaches (<20% precision, <20% recall). For task (b) using -omics data, CliXO outperforms other methods using two -omics datasets and achieves ∼30% precision and recall using YeastNet v3, similar to an earlier approach (Network Extracted Ontology) and better than LocalFitness or standard clustering (20-25% precision, recall).

Conclusion: This study provides algorithmic foundation for building gene ontologies by capturing hierarchical and pleiotropic structure embedded in biomolecular data.

Citing Articles

A clinical knowledge graph-based framework to prioritize candidate genes for facilitating diagnosis of Mendelian diseases and rare genetic conditions.

Gnanaolivu R, Oliver G, Jenkinson G, Blake E, Chen W, Chia N BMC Bioinformatics. 2025; 26(1):82.

PMID: 40087567 DOI: 10.1186/s12859-025-06096-2.


A graph neural network approach for hierarchical mapping of breast cancer protein communities.

Zhang X, Liu Q BMC Bioinformatics. 2025; 26(1):23.

PMID: 39838298 PMC: 11749236. DOI: 10.1186/s12859-024-06015-x.


Global siRNA Screen Reveals Critical Human Host Factors of SARS-CoV-2 Multicycle Replication.

Yin X, Pu Y, Yuan S, Pache L, Churas C, Weston S bioRxiv. 2024; .

PMID: 39026801 PMC: 11257544. DOI: 10.1101/2024.07.10.602835.


Mapping the Multiscale Proteomic Organization of Cellular and Disease Phenotypes.

Cesnik A, Schaffer L, Gaur I, Jain M, Ideker T, Lundberg E Annu Rev Biomed Data Sci. 2024; 7(1):369-389.

PMID: 38748859 PMC: 11343683. DOI: 10.1146/annurev-biodatasci-102423-113534.


Biology-inspired graph neural network encodes reactome and reveals biochemical reactions of disease.

Burkhart J, Wu G, Song X, Raimondi F, McWeeney S, Wong M Patterns (N Y). 2023; 4(7):100758.

PMID: 37521042 PMC: 10382942. DOI: 10.1016/j.patter.2023.100758.


References
1.
Alterovitz G, Xiang M, Hill D, Lomax J, Liu J, Cherkassky M . Ontology engineering. Nat Biotechnol. 2010; 28(2):128-30. PMC: 4829499. DOI: 10.1038/nbt0210-128. View

2.
Pena-Castillo L, Tasan M, Myers C, Lee H, Joshi T, Zhang C . A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol. 2008; 9 Suppl 1:S2. PMC: 2447536. DOI: 10.1186/gb-2008-9-s1-s2. View

3.
Dolinski K, Botstein D . Automating the construction of gene ontologies. Nat Biotechnol. 2013; 31(1):34-5. DOI: 10.1038/nbt.2476. View

4.
Hubble J, Demeter J, Jin H, Mao M, Nitzberg M, Reddy T . Implementation of GenePattern within the Stanford Microarray Database. Nucleic Acids Res. 2008; 37(Database issue):D898-901. PMC: 2686537. DOI: 10.1093/nar/gkn786. View

5.
Kumpula J, Kivela M, Kaski K, Saramaki J . Sequential algorithm for fast clique percolation. Phys Rev E Stat Nonlin Soft Matter Phys. 2008; 78(2 Pt 2):026109. DOI: 10.1103/PhysRevE.78.026109. View