Practical Approaches in Evaluating Validation and Biases of Machine Learning Applied to Mobile Health Studies

Overview

Journal Commun Med (Lond)

Publisher Nature Portfolio

Specialty General Medicine

Date 2024 Apr 22

PMID 38649784

Authors

Johannes Allgaier

Rudiger Pryss

Affiliations

Soon will be listed here.

Abstract

Background: Machine learning (ML) models are evaluated in a test set to estimate model performance after deployment. The design of the test set is therefore of importance because if the data distribution after deployment differs too much, the model performance decreases. At the same time, the data often contains undetected groups. For example, multiple assessments from one user may constitute a group, which is usually the case in mHealth scenarios.

Methods: In this work, we evaluate a model's performance using several cross-validation train-test-split approaches, in some cases deliberately ignoring the groups. By sorting the groups (in our case: Users) by time, we additionally simulate a concept drift scenario for better external validity. For this evaluation, we use 7 longitudinal mHealth datasets, all containing Ecological Momentary Assessments (EMA). Further, we compared the model performance with baseline heuristics, questioning the essential utility of a complex ML model.

Results: Hidden groups in the dataset leads to overestimation of ML performance after deployment. For prediction, a user's last completed questionnaire is a reasonable heuristic for the next response, and potentially outperforms a complex ML model. Because we included 7 studies, low variance appears to be a more fundamental phenomenon of mHealth datasets.

Conclusions: The way mHealth-based data are generated by EMA leads to questions of user and assessment level and appropriate validation of ML models. Our analysis shows that further research needs to follow to obtain robust ML models. In addition, simple heuristics can be considered as an alternative for ML. Domain experts should be consulted to find potentially hidden groups in the data.

Citing Articles

Exploring the predictive power of antinuclear antibodies and Rheumatoid factor correlations in anticipating therapeutic outcomes for female patients with coexisting Sjögren's syndrome and Rheumatoid arthritis.

Krishnan Pandarathodiyil A, Shree K H, Ramani P, Sivapathasundharam B, Ramadoss R J Oral Biol Craniofac Res. 2025; 15(2):288-296.

PMID: 40027855 PMC: 11869106. DOI: 10.1016/j.jobcr.2025.01.012.

Physical health and ecological momentary assessments during COVID-19: Data from the 'Corona Health' app users.

Allgaier J, Eichner F, Stork S, Heuschmann P, Pryss R Data Brief. 2025; 59:111289.

PMID: 39925386 PMC: 11802365. DOI: 10.1016/j.dib.2025.111289.

Process mining in mHealth data analysis.

Winter M, Langguth B, Schlee W, Pryss R NPJ Digit Med. 2024; 7(1):299.

PMID: 39443677 PMC: 11499602. DOI: 10.1038/s41746-024-01297-0.

References

Rudin C . Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nat Mach Intell. 2022; 1(5):206-215. PMC: 9122117. DOI: 10.1038/s42256-019-0048-x. View

Allgaier J, Schlee W, Probst T, Pryss R . Prediction of Tinnitus Perception Based on Daily Life MHealth Data Using Country Origin and Season. J Clin Med. 2022; 11(15). PMC: 9331976. DOI: 10.3390/jcm11154270. View

Dietterich . Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Comput. 1998; 10(7):1895-1923. DOI: 10.1162/089976698300017197. View

Simoes J, Schoisswohl S, Schlee W, Basso L, Bernal-Robledano A, Boecking B . The statistical analysis plan for the unification of treatments and interventions for tinnitus patients randomized clinical trial (UNITI-RCT). Trials. 2023; 24(1):472. PMC: 10367236. DOI: 10.1186/s13063-023-07303-2. View

Schlee W, Schoisswohl S, Staudinger S, Schiller A, Lehner A, Langguth B . Towards a unification of treatments and interventions for tinnitus patients: The EU research and innovation action UNITI. Prog Brain Res. 2021; 260:441-451. DOI: 10.1016/bs.pbr.2020.12.005. View

Schlee W, Hall D, Canlon B, Cima R, de Kleine E, Hauck F . Innovations in Doctoral Training and Research on Tinnitus: The European School on Interdisciplinary Tinnitus Research (ESIT) Perspective. Front Aging Neurosci. 2018; 9:447. PMC: 5770576. DOI: 10.3389/fnagi.2017.00447. View

Beierle F, Allgaier J, Stupp C, Keil T, Schlee W, Schobel J . Self-Assessment of Having COVID-19 With the Corona Check mHealth App. IEEE J Biomed Health Inform. 2023; 27(6):2794-2805. DOI: 10.1109/JBHI.2023.3264999. View

Shiffman S, Stone A, Hufford M . Ecological momentary assessment. Annu Rev Clin Psychol. 2008; 4:1-32. DOI: 10.1146/annurev.clinpsy.3.022806.091415. View

Beierle F, Schobel J, Vogel C, Allgaier J, Mulansky L, Haug F . Corona Health-A Study- and Sensor-Based Mobile App Platform Exploring Aspects of the COVID-19 Pandemic. Int J Environ Res Public Health. 2021; 18(14). PMC: 8303497. DOI: 10.3390/ijerph18147395. View

10.

Schleicher M, Unnikrishnan V, Neff P, Simoes J, Probst T, Pryss R . Understanding adherence to the recording of ecological momentary assessments in the example of tinnitus monitoring. Sci Rep. 2021; 10(1):22459. PMC: 7775469. DOI: 10.1038/s41598-020-79527-0. View

11.

Kraft R, Schlee W, Stach M, Reichert M, Langguth B, Baumeister H . Combining Mobile Crowdsensing and Ecological Momentary Assessments in the Healthcare Domain. Front Neurosci. 2020; 14:164. PMC: 7058696. DOI: 10.3389/fnins.2020.00164. View

12.

Vogel C, Schobel J, Schlee W, Engelke M, Pryss R . UNITI Mobile-EMI-Apps for a Large-Scale European Study on Tinnitus. Annu Int Conf IEEE Eng Med Biol Soc. 2021; 2021:2358-2362. DOI: 10.1109/EMBC46164.2021.9630482. View

13.

Humer E, Keil T, Stupp C, Schlee W, Wildner M, Heuschmann P . Associations of Country-Specific and Sociodemographic Factors With Self-Reported COVID-19-Related Symptoms: Multivariable Analysis of Data From the CoronaCheck Mobile Health Platform. JMIR Public Health Surveill. 2022; 9:e40958. PMC: 9901499. DOI: 10.2196/40958. View

14.

Wetzel B, Pryss R, Baumeister H, Edler J, Oliveira Goncalves A, Cohrdes C . "How Come You Don't Call Me?" Smartphone Communication App Usage as an Indicator of Loneliness and Social Well-Being across the Adult Lifespan during the COVID-19 Pandemic. Int J Environ Res Public Health. 2021; 18(12). PMC: 8227237. DOI: 10.3390/ijerph18126212. View

15.

Allgaier J, Schlee W, Langguth B, Probst T, Pryss R . Predicting the gender of individuals with tinnitus based on daily life data of the TrackYourTinnitus mHealth platform. Sci Rep. 2021; 11(1):18375. PMC: 8443560. DOI: 10.1038/s41598-021-96731-8. View

16.

Kroenke K, Spitzer R, Williams J . The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001; 16(9):606-13. PMC: 1495268. DOI: 10.1046/j.1525-1497.2001.016009606.x. View