» Articles » PMID: 35094685

Seek COVER: Using a Disease Proxy to Rapidly Develop and Validate a Personalized Risk Calculator for COVID-19 Outcomes in an International Network

Abstract

Background: We investigated whether we could use influenza data to develop prediction models for COVID-19 to increase the speed at which prediction models can reliably be developed and validated early in a pandemic. We developed COVID-19 Estimated Risk (COVER) scores that quantify a patient's risk of hospital admission with pneumonia (COVER-H), hospitalization with pneumonia requiring intensive services or death (COVER-I), or fatality (COVER-F) in the 30-days following COVID-19 diagnosis using historical data from patients with influenza or flu-like symptoms and tested this in COVID-19 patients.

Methods: We analyzed a federated network of electronic medical records and administrative claims data from 14 data sources and 6 countries containing data collected on or before 4/27/2020. We used a 2-step process to develop 3 scores using historical data from patients with influenza or flu-like symptoms any time prior to 2020. The first step was to create a data-driven model using LASSO regularized logistic regression, the covariates of which were used to develop aggregate covariates for the second step where the COVER scores were developed using a smaller set of features. These 3 COVER scores were then externally validated on patients with 1) influenza or flu-like symptoms and 2) confirmed or suspected COVID-19 diagnosis across 5 databases from South Korea, Spain, and the United States. Outcomes included i) hospitalization with pneumonia, ii) hospitalization with pneumonia requiring intensive services or death, and iii) death in the 30 days after index date.

Results: Overall, 44,507 COVID-19 patients were included for model validation. We identified 7 predictors (history of cancer, chronic obstructive pulmonary disease, diabetes, heart disease, hypertension, hyperlipidemia, kidney disease) which combined with age and sex discriminated which patients would experience any of our three outcomes. The models achieved good performance in influenza and COVID-19 cohorts. For COVID-19 the AUC ranges were, COVER-H: 0.69-0.81, COVER-I: 0.73-0.91, and COVER-F: 0.72-0.90. Calibration varied across the validations with some of the COVID-19 validations being less well calibrated than the influenza validations.

Conclusions: This research demonstrated the utility of using a proxy disease to develop a prediction model. The 3 COVER models with 9-predictors that were developed using influenza data perform well for COVID-19 patients for predicting hospitalization, intensive services, and fatality. The scores showed good discriminatory performance which transferred well to the COVID-19 population. There was some miscalibration in the COVID-19 validations, which is potentially due to the difference in symptom severity between the two diseases. A possible solution for this is to recalibrate the models in each location before use.

Citing Articles

Development and validation of a patient-level model to predict dementia across a network of observational databases.

John L, Fridgeirsson E, Kors J, Reps J, Williams R, Ryan P BMC Med. 2024; 22(1):308.

PMID: 39075527 PMC: 11288076. DOI: 10.1186/s12916-024-03530-9.


Rapid Development of a Registry to Accelerate COVID-19 Vaccine Clinical Trials.

Abernethy N, McCloskey K, Trahey M, Rinn L, Broder G, Andrasik M Res Sq. 2024; .

PMID: 38947011 PMC: 11213164. DOI: 10.21203/rs.3.rs-4397271/v1.


Prediction of 30-day, 90-day, and 1-year mortality after colorectal cancer surgery using a data-driven approach.

Brauner K, Tsouchnika A, Mashkoor M, Williams R, Rosen A, Hartwig M Int J Colorectal Dis. 2024; 39(1):31.

PMID: 38421482 PMC: 10904562. DOI: 10.1007/s00384-024-04607-w.


Health-Analytics Data to Evidence Suite (HADES): Open-Source Software for Observational Research.

Schuemie M, Reps J, Black A, DeFalco F, Evans L, Fridgeirsson E Stud Health Technol Inform. 2024; 310:966-970.

PMID: 38269952 PMC: 10868467. DOI: 10.3233/SHTI231108.


Scalable Infrastructure Supporting Reproducible Nationwide Healthcare Data Analysis toward FAIR Stewardship.

Kim J, Kim C, Kim K, Lee Y, Yu D, Yun J Sci Data. 2023; 10(1):674.

PMID: 37794003 PMC: 10550904. DOI: 10.1038/s41597-023-02580-7.


References
1.
Piroth L, Cottenet J, Mariet A, Bonniaud P, Blot M, Tubert-Bitter P . Comparison of the characteristics, morbidity, and mortality of COVID-19 and seasonal influenza: a nationwide, population-based retrospective cohort study. Lancet Respir Med. 2020; 9(3):251-259. PMC: 7832247. DOI: 10.1016/S2213-2600(20)30527-0. View

2.
Steyerberg E, Vergouwe Y . Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014; 35(29):1925-31. PMC: 4155437. DOI: 10.1093/eurheartj/ehu207. View

3.
Overhage J, Ryan P, Reich C, Hartzema A, Stang P . Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc. 2011; 19(1):54-60. PMC: 3240764. DOI: 10.1136/amiajnl-2011-000376. View

4.
Wynants L, Van Calster B, Collins G, Riley R, Heinze G, Schuit E . Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ. 2020; 369:m1328. PMC: 7222643. DOI: 10.1136/bmj.m1328. View

5.
Riley R, Ensor J, Snell K, Harrell Jr F, Martin G, Reitsma J . Calculating the sample size required for developing a clinical prediction model. BMJ. 2020; 368:m441. DOI: 10.1136/bmj.m441. View