» Articles » PMID: 38811835

Identifying Multi-resolution Clusters of Diseases in Ten Million Patients with Multimorbidity in Primary Care in England

Overview
Publisher Nature Portfolio
Specialty General Medicine
Date 2024 May 29
PMID 38811835
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Identifying clusters of diseases may aid understanding of shared aetiology, management of co-morbidities, and the discovery of new disease associations. Our study aims to identify disease clusters using a large set of long-term conditions and comparing methods that use the co-occurrence of diseases versus methods that use the sequence of disease development in a person over time.

Methods: We use electronic health records from over ten million people with multimorbidity registered to primary care in England. First, we extract data-driven representations of 212 diseases from patient records employing (i) co-occurrence-based methods and (ii) sequence-based natural language processing methods. Second, we apply the graph-based Markov Multiscale Community Detection (MMCD) to identify clusters based on disease similarity at multiple resolutions. We evaluate the representations and clusters using a clinically curated set of 253 known disease association pairs, and qualitatively assess the interpretability of the clusters.

Results: Both co-occurrence and sequence-based algorithms generate interpretable disease representations, with the best performance from the skip-gram algorithm. MMCD outperforms k-means and hierarchical clustering in explaining known disease associations. We find that diseases display an almost-hierarchical structure across resolutions from closely to more loosely similar co-occurrence patterns and identify interpretable clusters corresponding to both established and novel patterns.

Conclusions: Our method provides a tool for clustering diseases at different levels of resolution from co-occurrence patterns in high-dimensional electronic health records, which could be used to facilitate discovery of associations between diseases in the future.

Citing Articles

Comparing natural language processing representations of coded disease sequences for prediction in electronic health records.

Beaney T, Jha S, Alaa A, Smith A, Clarke J, Woodcock T J Am Med Inform Assoc. 2024; 31(7):1451-1462.

PMID: 38719204 PMC: 11187492. DOI: 10.1093/jamia/ocae091.


Assigning disease clusters to people: A cohort study of the implications for understanding health outcomes in people with multiple long-term conditions.

Beaney T, Clarke J, Salman D, Woodcock T, Majeed A, Barahona M J Multimorb Comorb. 2024; 14:26335565241247430.

PMID: 38638408 PMC: 11025432. DOI: 10.1177/26335565241247430.

References
1.
Kassi E, Pervanidou P, Kaltsas G, Chrousos G . Metabolic syndrome: definitions and controversies. BMC Med. 2011; 9:48. PMC: 3115896. DOI: 10.1186/1741-7015-9-48. View

2.
Dynomant E, Lelong R, Dahamna B, Massonnaud C, Kerdelhue G, Grosjean J . Word Embedding for the French Natural Language in Health Care: Comparative Study. JMIR Med Inform. 2019; 7(3):e12310. PMC: 6690161. DOI: 10.2196/12310. View

3.
Stokes J, Guthrie B, Mercer S, Rice N, Sutton M . Multimorbidity combinations, costs of hospital care and potentially preventable emergency admissions in England: A cohort study. PLoS Med. 2021; 18(1):e1003514. PMC: 7815339. DOI: 10.1371/journal.pmed.1003514. View

4.
Head A, Fleming K, Kypridemos C, Schofield P, Pearson-Stuttard J, Oflaherty M . Inequalities in incident and prevalent multimorbidity in England, 2004-19: a population-based, descriptive study. Lancet Healthy Longev. 2022; 2(8):e489-e497. DOI: 10.1016/S2666-7568(21)00146-X. View

5.
Beaney T, Clarke J, Woodcock T, Majeed A, Barahona M, Aylin P . Effect of timeframes to define long term conditions and sociodemographic factors on prevalence of multimorbidity using disease code frequency in primary care electronic health records: retrospective study. BMJ Med. 2024; 3(1):e000474. PMC: 10868275. DOI: 10.1136/bmjmed-2022-000474. View