Exploiting Disjointness Axioms to Improve Semantic Similarity Measures
Overview
Affiliations
Motivation: Representing domain knowledge in biology has traditionally been accomplished by creating simple hierarchies of classes with textual annotations. Recently, expressive ontology languages, such as Web Ontology Language, have become more widely adopted, supporting axioms that express logical relationships other than class-subclass, e.g. disjointness. This is improving the coverage and validity of the knowledge contained in biological ontologies. However, current semantic tools still need to adapt to this more expressive information. In this article, we propose a method to integrate disjointness axioms, which are being incorporated in real-world ontologies, such as the Gene Ontology and the chemical entities of biological interest ontology, into semantic similarity, the measure that estimates the closeness in meaning between classes.
Results: We present a modification of the measure of shared information content, which extends the base measure to allow the incorporation of disjointness information. To evaluate our approach, we applied it to several randomly selected datasets extracted from the chemical entities of biological interest ontology. In 93.8% of these datasets, our measure performed better than the base measure of shared information content. This supports the idea that semantic similarity is more accurate if it extends beyond the hierarchy of classes of the ontology.
Contact: joao.ferreira@lasige.di.fc.ul.pt.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Hastings J, Glauer M, Memariani A, Neuhaus F, Mossakowski T J Cheminform. 2021; 13(1):23.
PMID: 33726837 PMC: 7962259. DOI: 10.1186/s13321-021-00500-8.
Ayllon-Benitez A, Mougin F, Allali J, Thiebaut R, Thebault P PLoS One. 2018; 13(11):e0208037.
PMID: 30481204 PMC: 6258551. DOI: 10.1371/journal.pone.0208037.
Evaluating the effect of annotation size on measures of semantic similarity.
Kulmanov M, Hoehndorf R J Biomed Semantics. 2017; 8(1):7.
PMID: 28193260 PMC: 5307803. DOI: 10.1186/s13326-017-0119-z.
Corpus domain effects on distributional semantic modeling of medical terms.
Pakhomov S, Finley G, McEwan R, Wang Y, Melton G Bioinformatics. 2016; 32(23):3635-3644.
PMID: 27531100 PMC: 5181540. DOI: 10.1093/bioinformatics/btw529.
ChEBI in 2016: Improved services and an expanding collection of metabolites.
Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V Nucleic Acids Res. 2015; 44(D1):D1214-9.
PMID: 26467479 PMC: 4702775. DOI: 10.1093/nar/gkv1031.