» Articles » PMID: 21092226

Knowledge-based Biomedical Word Sense Disambiguation: Comparison of Approaches

Overview
Publisher Biomed Central
Specialty Biology
Date 2010 Nov 25
PMID 21092226
Citations 14
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Word sense disambiguation (WSD) algorithms attempt to select the proper sense of ambiguous terms in text. Resources like the UMLS provide a reference thesaurus to be used to annotate the biomedical literature. Statistical learning approaches have produced good results, but the size of the UMLS makes the production of training data infeasible to cover all the domain.

Methods: We present research on existing WSD approaches based on knowledge bases, which complement the studies performed on statistical learning. We compare four approaches which rely on the UMLS Metathesaurus as the source of knowledge. The first approach compares the overlap of the context of the ambiguous word to the candidate senses based on a representation built out of the definitions, synonyms and related terms. The second approach collects training data for each of the candidate senses to perform WSD based on queries built using monosemous synonyms and related terms. These queries are used to retrieve MEDLINE citations. Then, a machine learning approach is trained on this corpus. The third approach is a graph-based method which exploits the structure of the Metathesaurus network of relations to perform unsupervised WSD. This approach ranks nodes in the graph according to their relative structural importance. The last approach uses the semantic types assigned to the concepts in the Metathesaurus to perform WSD. The context of the ambiguous word and semantic types of the candidate concepts are mapped to Journal Descriptors. These mappings are compared to decide among the candidate concepts. Results are provided estimating accuracy of the different methods on the WSD test collection available from the NLM.

Conclusions: We have found that the last approach achieves better results compared to the other methods. The graph-based approach, using the structure of the Metathesaurus network to estimate the relevance of the Metathesaurus concepts, does not perform well compared to the first two methods. In addition, the combination of methods improves the performance over the individual approaches. On the other hand, the performance is still below statistical learning trained on manually produced data and below the maximum frequency sense baseline. Finally, we propose several directions to improve the existing methods and to improve the Metathesaurus to be more effective in WSD.

Citing Articles

Clinical Note Structural Knowledge Improves Word Sense Disambiguation.

Chen F, Zhang G, Chen S, Callahan T, Weng C AMIA Jt Summits Transl Sci Proc. 2024; 2024:515-524.

PMID: 38827062 PMC: 11141859.


Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and Health.

Newman-Griffis D, Fosler-Lussier E Front Digit Health. 2021; 3.

PMID: 33791684 PMC: 8009547. DOI: 10.3389/fdgth.2021.620828.


Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks.

Zhang C, Bis D, Liu X, He Z BMC Bioinformatics. 2019; 20(Suppl 16):502.

PMID: 31787096 PMC: 6886160. DOI: 10.1186/s12859-019-3079-8.


A semantic-based workflow for biomedical literature annotation.

Sernadela P, Oliveira J Database (Oxford). 2017; 2017.

PMID: 29220478 PMC: 5691355. DOI: 10.1093/database/bax088.


Word sense disambiguation in the clinical domain: a comparison of knowledge-rich and knowledge-poor unsupervised methods.

Chasin R, Rumshisky A, Uzuner O, Szolovits P J Am Med Inform Assoc. 2014; 21(5):842-9.

PMID: 24441986 PMC: 4147600. DOI: 10.1136/amiajnl-2013-002133.


References
1.
Schuemie M, Kors J, Mons B . Word sense disambiguation in the biomedical domain: an overview. J Comput Biol. 2005; 12(5):554-65. DOI: 10.1089/cmb.2005.12.554. View

2.
Demner-Fushman D, Mork J, Shooshan S, Aronson A . UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text. J Biomed Inform. 2010; 43(4):587-94. PMC: 2890296. DOI: 10.1016/j.jbi.2010.02.005. View

3.
Bodenreider O . The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2003; 32(Database issue):D267-70. PMC: 308795. DOI: 10.1093/nar/gkh061. View

4.
Humphrey S, Rogers W, Kilicoglu H, Demner-Fushman D, Rindflesch T . Word Sense Disambiguation by Selecting the Best Semantic Type Based on Journal Descriptor Indexing: Preliminary Experiment. J Am Soc Inf Sci Technol. 2009; 57(1):96-113. PMC: 2771948. DOI: 10.1002/asi.20257. View

5.
Leroy G, Rindflesch T . Effects of information and machine learning algorithms on word sense disambiguation with small datasets. Int J Med Inform. 2005; 74(7-8):573-85. DOI: 10.1016/j.ijmedinf.2005.03.013. View