» Articles » PMID: 22078312

Exact Score Distribution Computation for Ontological Similarity Searches

Overview
Publisher Biomed Central
Specialty Biology
Date 2011 Nov 15
PMID 22078312
Citations 11
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Semantic similarity searches in ontologies are an important component of many bioinformatic algorithms, e.g., finding functionally related proteins with the Gene Ontology or phenotypically similar diseases with the Human Phenotype Ontology (HPO). We have recently shown that the performance of semantic similarity searches can be improved by ranking results according to the probability of obtaining a given score at random rather than by the scores themselves. However, to date, there are no algorithms for computing the exact distribution of semantic similarity scores, which is necessary for computing the exact P-value of a given score.

Results: In this paper we consider the exact computation of score distributions for similarity searches in ontologies, and introduce a simple null hypothesis which can be used to compute a P-value for the statistical significance of similarity scores. We concentrate on measures based on Resnik's definition of ontological similarity. A new algorithm is proposed that collapses subgraphs of the ontology graph and thereby allows fast score distribution computation. The new algorithm is several orders of magnitude faster than the naive approach, as we demonstrate by computing score distributions for similarity searches in the HPO. It is shown that exact P-value calculation improves clinical diagnosis using the HPO compared to approaches based on sampling.

Conclusions: The new algorithm enables for the first time exact P-value calculation via exact score distribution computation for ontology similarity searches. The approach is applicable to any ontology for which the annotation-propagation rule holds and can improve any bioinformatic method that makes only use of the raw similarity scores. The algorithm was implemented in Java, supports any ontology in OBO format, and is available for non-commercial and academic usage under: https://compbio.charite.de/svn/hpo/trunk/src/tools/significance/

Citing Articles

Prenatal phenotyping: A community effort to enhance the Human Phenotype Ontology.

Dhombres F, Morgan P, Chaudhari B, Filges I, Sparks T, Lapunzina P Am J Med Genet C Semin Med Genet. 2022; 190(2):231-242.

PMID: 35872606 PMC: 9588534. DOI: 10.1002/ajmg.c.31989.


Strategies to Uplift Novel Mendelian Gene Discovery for Improved Clinical Outcomes.

Seaby E, Rehm H, ODonnell-Luria A Front Genet. 2021; 12:674295.

PMID: 34220947 PMC: 8248347. DOI: 10.3389/fgene.2021.674295.


Modeling seizures in the Human Phenotype Ontology according to contemporary ILAE concepts makes big phenotypic data tractable.

Lewis-Smith D, Galer P, Balagura G, Kearney H, Ganesan S, Cosico M Epilepsia. 2021; 62(6):1293-1305.

PMID: 33949685 PMC: 8272408. DOI: 10.1111/epi.16908.


The case for open science: rare diseases.

Rubinstein Y, Robinson P, Gahl W, Avillach P, Baynam G, Cederroth H JAMIA Open. 2021; 3(3):472-486.

PMID: 33426479 PMC: 7660964. DOI: 10.1093/jamiaopen/ooaa030.


Encoding Clinical Data with the Human Phenotype Ontology for Computational Differential Diagnostics.

Kohler S, Oien N, Buske O, Groza T, Jacobsen J, McNamara C Curr Protoc Hum Genet. 2019; 103(1):e92.

PMID: 31479590 PMC: 6814016. DOI: 10.1002/cphg.92.


References
1.
Mistry M, Pavlidis P . Gene Ontology term overlap as a measure of gene functional similarity. BMC Bioinformatics. 2008; 9:327. PMC: 2518162. DOI: 10.1186/1471-2105-9-327. View

2.
Robinson P, Kohler S, Bauer S, Seelow D, Horn D, Mundlos S . The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet. 2008; 83(5):610-5. PMC: 2668030. DOI: 10.1016/j.ajhg.2008.09.017. View

3.
Kohler S, Schulz M, Krawitz P, Bauer S, Dolken S, Ott C . Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am J Hum Genet. 2009; 85(4):457-64. PMC: 2756558. DOI: 10.1016/j.ajhg.2009.09.003. View

4.
Eilbeck K, Lewis S, Mungall C, Yandell M, Stein L, Durbin R . The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 2005; 6(5):R44. PMC: 1175956. DOI: 10.1186/gb-2005-6-5-r44. View

5.
Bard J, Rhee S, Ashburner M . An ontology for cell types. Genome Biol. 2005; 6(2):R21. PMC: 551541. DOI: 10.1186/gb-2005-6-2-r21. View