» Articles » PMID: 36747894

Short Text Topic Modelling Using Local and Global Word-context Semantic Correlation

Overview
Date 2023 Feb 7
PMID 36747894
Authors
Affiliations
Soon will be listed here.
Abstract

Nowadays, people use short text to portray their opinions on platforms of social media such as Twitter, Facebook, and YouTube, as well as on e-commerce websites such as Amazon and Flipkart to share their commercial purchasing experiences. Every day, billions of short texts are created worldwide in tweets, tags, keywords, search queries etc. However, this short text possesses inadequate contextual information, which can be ambiguous, sparse, noisy, remains a major challenge. State-of-the-art strategies of topic modeling such as Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis are not suitable as it contains a limited number of words in a single document. This work proposes a new model named G_SeaNMF (Gensim_SeaNMF) to improve the word-context semantic relationship by using local and global word embedding techniques. Word embeddings learned from a large corpus provide general semantic and syntactic information about words; it can guide topic modeling for short text collections as supporting information for sparse co-occurrence patterns. In the proposed model, SeaNMF (Semantics-assisted Non-negative Matrix Factorization) is incorporated with word2vec model of Gensim library to strengthen the word's semantic relationship. In this article, a short text topic modeling techniques based on DMM (Dirichlet Multinomial Mixture), self-aggregation and global word co-occurrence were explored. These are evaluated using different measures to gauge cluster coherence on real-world datasets such as Search Snippet, Biomedicine, Pascal Flickr, Tweet and TagMyNews. Empirical evaluation shows that a combination of local and global word embedding provides more appropriate words under each topic with improved outcomes.

Citing Articles

Novel Approach to Personalized Physician Recommendations Using Semantic Features and Response Metrics: Model Evaluation Study.

Zheng Y, Cai Y, Yan Y, Chen S, Gong K JMIR Hum Factors. 2024; 11:e57670.

PMID: 39146009 PMC: 11362707. DOI: 10.2196/57670.

References
1.
Roccetti M, Marfia G, Salomoni P, Prandi C, Zagari R, Gningaye Kengni F . Attitudes of Crohn's Disease Patients: Infodemiology Case Study and Sentiment Analysis of Facebook and Twitter Posts. JMIR Public Health Surveill. 2017; 3(3):e51. PMC: 5569247. DOI: 10.2196/publichealth.7004. View

2.
Albalawi R, Yeap T, Benyoucef M . Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis. Front Artif Intell. 2021; 3:42. PMC: 7861298. DOI: 10.3389/frai.2020.00042. View

3.
Murakami R, Chakraborty B . Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts. Sensors (Basel). 2022; 22(3). PMC: 8840106. DOI: 10.3390/s22030852. View