» Articles » PMID: 36417228

A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research-An International Collaboration

Overview
Publisher MDPI
Specialty Public Health
Date 2022 Nov 23
PMID 36417228
Authors
Affiliations
Soon will be listed here.
Abstract

As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated on the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique worldwide event in biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 1.12 billion tweets, growing daily, related to COVID-19 chatter generated from 1 January 2020 to 27 June 2021 at the time of writing. This data source provides a freely available additional data source for researchers worldwide to conduct a wide and diverse number of research projects, such as epidemiological analyses, emotional and mental responses to social distancing measures, the identification of sources of misinformation, stratified measurement of sentiment towards the pandemic in near real time, among many others.

Citing Articles

Leveraging Large Language Models for Infectious Disease Surveillance-Using a Web Service for Monitoring COVID-19 Patterns From Self-Reporting Tweets: Content Analysis.

Xie J, Zhang Z, Zeng S, Hilliard J, An G, Tang X J Med Internet Res. 2025; 27:e63190.

PMID: 39977859 PMC: 11888100. DOI: 10.2196/63190.


Signals of propaganda-Detecting and estimating political influences in information spread in social networks.

Sela A, Neter O, Lohr V, Cihelka P, Wang F, Zwilling M PLoS One. 2025; 20(1):e0309688.

PMID: 39883667 PMC: 11781619. DOI: 10.1371/journal.pone.0309688.


MGLEP: Multimodal Graph Learning for Modeling Emerging Pandemics with Big Data.

Tran K, Hy T, Jiang L, Vu X Sci Rep. 2024; 14(1):16377.

PMID: 39013976 PMC: 11252387. DOI: 10.1038/s41598-024-67146-y.


First public dataset to study 2023 Turkish general election.

Najafi A, Mugurtay N, Zouzou Y, Demirci E, Demirkiran S, Karadeniz H Sci Rep. 2024; 14(1):8794.

PMID: 38627434 PMC: 11021468. DOI: 10.1038/s41598-024-58006-w.


Tracking collective emotions in 16 countries during COVID-19: a novel methodology for identifying major emotional events using Twitter.

Chauhan A, Belhekar V, Sehgal S, Singh H, Prakash J Front Psychol. 2024; 14:1105875.

PMID: 38591070 PMC: 11000126. DOI: 10.3389/fpsyg.2023.1105875.


References
1.
Emmert-Streib F, Dehmer M, Yli-Harja O . Against Dataism and for Data Sharing of Big Biomedical and Clinical Data with Research Parasites. Front Genet. 2016; 7:154. PMC: 5005320. DOI: 10.3389/fgene.2016.00154. View

2.
Hussain A, Tahir A, Hussain Z, Sheikh Z, Gogate M, Dashtipour K . Artificial Intelligence-Enabled Analysis of Public Attitudes on Facebook and Twitter Toward COVID-19 Vaccines in the United Kingdom and the United States: Observational Study. J Med Internet Res. 2021; 23(4):e26627. PMC: 8023383. DOI: 10.2196/26627. View

3.
Tekumalla R, Banda J . Social Media Mining Toolkit (SMMT). Genomics Inform. 2020; 18(2):e16. PMC: 7362951. DOI: 10.5808/GI.2020.18.2.e16. View

4.
Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z . Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020; 395(10229):1054-1062. PMC: 7270627. DOI: 10.1016/S0140-6736(20)30566-3. View

5.
Gao J, Tian Z, Yang X . Breakthrough: Chloroquine phosphate has shown apparent efficacy in treatment of COVID-19 associated pneumonia in clinical studies. Biosci Trends. 2020; 14(1):72-73. DOI: 10.5582/bst.2020.01047. View