» Articles » PMID: 28782059

Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda

Overview
Publisher ACM
Date 2017 Aug 8
PMID 28782059
Citations 10
Authors
Affiliations
Soon will be listed here.
Abstract

Effective disease monitoring provides a foundation for effective public health systems. This has historically been accomplished with patient contact and bureaucratic aggregation, which tends to be slow and expensive. Recent internet-based approaches promise to be real-time and cheap, with few parameters. However, the question of these approaches work remains open. We addressed this question using Wikipedia access logs and category links. Our experiments, replicable and extensible using our open source code and data, test the effect of semantic article filtering, amount of training data, forecast horizon, and model staleness by comparing across 6 diseases and 4 countries using thousands of individual models. We found that our minimal-configuration, language-agnostic article selection process based on semantic relatedness is effective for improving predictions, and that our approach is relatively insensitive to the amount and age of training data. We also found, in contrast to prior work, very little forecasting value, and we argue that this is consistent with theoretical considerations about the nature of forecasting. These mixed results lead us to propose that the currently observational field of internet-based disease surveillance must pivot to include theoretical models of information flow as well as controlled experiments based on simulations of disease.

Citing Articles

A general method for estimating the prevalence of influenza-like-symptoms with Wikipedia data.

Toni G, Consonni C, Montresor A PLoS One. 2021; 16(8):e0256858.

PMID: 34464416 PMC: 8407583. DOI: 10.1371/journal.pone.0256858.


Surveilling Influenza Incidence With Centers for Disease Control and Prevention Web Traffic Data: Demonstration Using a Novel Dataset.

Caldwell W, Fairchild G, Del Valle S J Med Internet Res. 2020; 22(7):e14337.

PMID: 32437327 PMC: 7367534. DOI: 10.2196/14337.


Comparison of Social Media, Syndromic Surveillance, and Microbiologic Acute Respiratory Infection Data: Observational Study.

Daughton A, Chunara R, Paul M JMIR Public Health Surveill. 2020; 6(2):e14986.

PMID: 32329741 PMC: 7210500. DOI: 10.2196/14986.


Google Health Trends performance reflecting dengue incidence for the Brazilian states.

Romero-Alvarez D, Parikh N, Osthus D, Martinez K, Generous N, Del Valle S BMC Infect Dis. 2020; 20(1):252.

PMID: 32228508 PMC: 7104526. DOI: 10.1186/s12879-020-04957-0.


The Application of Internet-Based Sources for Public Health Surveillance (Infoveillance): Systematic Review.

Barros J, Duggan J, Rebholz-Schuhmann D J Med Internet Res. 2020; 22(3):e13680.

PMID: 32167477 PMC: 7101503. DOI: 10.2196/13680.


References
1.
Signorini A, Segre A, Polgreen P . The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic. PLoS One. 2011; 6(5):e19467. PMC: 3087759. DOI: 10.1371/journal.pone.0019467. View

2.
Thompson L, Malik M, Gumel A, Strome T, Mahmud S . Emergency department and 'Google flu trends' data as syndromic surveillance indicators for seasonal influenza. Epidemiol Infect. 2014; 142(11):2397-405. PMC: 9151325. DOI: 10.1017/S0950268813003464. View

3.
McIver D, Brownstein J . Wikipedia usage estimates prevalence of influenza-like illness in the United States in near real-time. PLoS Comput Biol. 2014; 10(4):e1003581. PMC: 3990502. DOI: 10.1371/journal.pcbi.1003581. View

4.
Bravata D, McDonald K, Smith W, Rydzak C, Szeto H, Buckeridge D . Systematic review: surveillance systems for early detection of bioterrorism-related diseases. Ann Intern Med. 2004; 140(11):910-22. DOI: 10.7326/0003-4819-140-11-200406010-00013. View

5.
Thorner A, Cao B, Jiang T, Warner A, Bonis P . Correlation Between UpToDate Searches and Reported Cases of Middle East Respiratory Syndrome During Outbreaks in Saudi Arabia. Open Forum Infect Dis. 2016; 3(1):ofw043. PMC: 4803184. DOI: 10.1093/ofid/ofw043. View