Inferring Gene Ontologies from Pairwise Similarity Data
Overview
Affiliations
Motivation: While the manually curated Gene Ontology (GO) is widely used, inferring a GO directly from -omics data is a compelling new problem. Recognizing that ontologies are a directed acyclic graph (DAG) of terms and hierarchical relations, algorithms are needed that: analyze a full matrix of gene-gene pairwise similarities from -omics data; infer true hierarchical structure in these data rather than enforcing hierarchy as a computational artifact; and respect biological pleiotropy, by which a term in the hierarchy can relate to multiple higher level terms. Methods addressing these requirements are just beginning to emerge-none has been evaluated for GO inference.
Methods: We consider two algorithms [Clique Extracted Ontology (CliXO), LocalFitness] that uniquely satisfy these requirements, compared with methods including standard clustering. CliXO is a new approach that finds maximal cliques in a network induced by progressive thresholding of a similarity matrix. We evaluate each method's ability to reconstruct the GO biological process ontology from a similarity matrix based on (a) semantic similarities for GO itself or (b) three -omics datasets for yeast.
Results: For task (a) using semantic similarity, CliXO accurately reconstructs GO (>99% precision, recall) and outperforms other approaches (<20% precision, <20% recall). For task (b) using -omics data, CliXO outperforms other methods using two -omics datasets and achieves ∼30% precision and recall using YeastNet v3, similar to an earlier approach (Network Extracted Ontology) and better than LocalFitness or standard clustering (20-25% precision, recall).
Conclusion: This study provides algorithmic foundation for building gene ontologies by capturing hierarchical and pleiotropic structure embedded in biomolecular data.
Gnanaolivu R, Oliver G, Jenkinson G, Blake E, Chen W, Chia N BMC Bioinformatics. 2025; 26(1):82.
PMID: 40087567 DOI: 10.1186/s12859-025-06096-2.
A graph neural network approach for hierarchical mapping of breast cancer protein communities.
Zhang X, Liu Q BMC Bioinformatics. 2025; 26(1):23.
PMID: 39838298 PMC: 11749236. DOI: 10.1186/s12859-024-06015-x.
Global siRNA Screen Reveals Critical Human Host Factors of SARS-CoV-2 Multicycle Replication.
Yin X, Pu Y, Yuan S, Pache L, Churas C, Weston S bioRxiv. 2024; .
PMID: 39026801 PMC: 11257544. DOI: 10.1101/2024.07.10.602835.
Mapping the Multiscale Proteomic Organization of Cellular and Disease Phenotypes.
Cesnik A, Schaffer L, Gaur I, Jain M, Ideker T, Lundberg E Annu Rev Biomed Data Sci. 2024; 7(1):369-389.
PMID: 38748859 PMC: 11343683. DOI: 10.1146/annurev-biodatasci-102423-113534.
Biology-inspired graph neural network encodes reactome and reveals biochemical reactions of disease.
Burkhart J, Wu G, Song X, Raimondi F, McWeeney S, Wong M Patterns (N Y). 2023; 4(7):100758.
PMID: 37521042 PMC: 10382942. DOI: 10.1016/j.patter.2023.100758.