Clustering and Topic Modeling over Tweets: A Comparison over a Health Dataset

Overview

Journal Proceedings (IEEE Int Conf Bioinformatics Biomed)

Date 2022 Apr 25

PMID 35463811

Authors

Juan Antonio Lossio-Ventura

Juandiego Morzan

Hugo Alatrista-Salas

Tina Hernandez-Boussard

Jiang Bian

Affiliations

Soon will be listed here.

Abstract

Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.

Citing Articles

A topic modeling approach for analyzing and categorizing electronic healthcare documents in Afaan Oromo without label information.

Dinsa E, Das M, Abebe T Sci Rep. 2024; 14(1):32051.

PMID: 39738682 PMC: 11686009. DOI: 10.1038/s41598-024-83743-3.

An integrated clustering and BERT framework for improved topic modeling.

George L, Sumathy P Int J Inf Technol. 2023; 15(4):2187-2195.

PMID: 37256029 PMC: 10163298. DOI: 10.1007/s41870-023-01268-w.

Evaluation of clustering and topic modeling methods over health-related tweets and emails.

Lossio-Ventura J, Gonzales S, Morzan J, Alatrista-Salas H, Hernandez-Boussard T, Bian J Artif Intell Med. 2021; 117:102096.

PMID: 34127235 PMC: 9040385. DOI: 10.1016/j.artmed.2021.102096.

References

Martino I, DApolito R, McLawhorn A, Fehring K, Sculco P, Gasparini G . Social media for patients: benefits and drawbacks. Curr Rev Musculoskelet Med. 2017; 10(1):141-145. PMC: 5344865. DOI: 10.1007/s12178-017-9394-7. View

Lossio-Ventura J, Bian J, Jonquet C, Roche M, Teisseire M . A novel framework for biomedical entity sense induction. J Biomed Inform. 2018; 84:31-41. PMC: 6766751. DOI: 10.1016/j.jbi.2018.06.007. View

Braithwaite S, Giraud-Carrier C, West J, Barnes M, Hanson C . Validating Machine Learning Algorithms for Twitter Data Against Established Measures of Suicidality. JMIR Ment Health. 2016; 3(2):e21. PMC: 4886102. DOI: 10.2196/mental.4822. View

Ofoghi B, Mann M, Verspoor K . TOWARDS EARLY DISCOVERY OF SALIENT HEALTH THREATS: A SOCIAL MEDIA EMOTION CLASSIFICATION TECHNIQUE. Pac Symp Biocomput. 2016; 21:504-15. View

Zhang L, Hall M, Bastola D . Utilizing Twitter data for analysis of chemotherapy. Int J Med Inform. 2018; 120:92-100. DOI: 10.1016/j.ijmedinf.2018.10.002. View