» Articles » PMID: 39637177

Care Home Resident Identification: A Comparison of Address Matching Methods with Natural Language Processing

Overview
Journal PLoS One
Date 2024 Dec 5
PMID 39637177
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Care home residents are a highly vulnerable group, but identifying care home residents in routine data is challenging. This study aimed to develop and validate Natural Language Processing (NLP) methods to identify care home residents from primary care address records.

Methods: The proposed system applies an NLP sequential filtering and preprocessing of text, then the calculation of similarity scores between general practice (GP) addresses and care home registered addresses. Performance was evaluated in a diagnostic test study comparing NLP prediction to independent, gold-standard manual identification of care home addresses. The analysis used population data for 771,588 uniquely written addresses for 819,911 people in two NHS Scotland health board regions. The source code is publicly available at https://github.com/vsuarezpaniagua/NLPcarehome.

Results: Care home resident identification by NLP methods overall was better in Fife than in Tayside, and better in the over-65s than in the whole population. Methods with the best performance were Correlation (sensitivity 90.2%, PPV 92.0%) for Fife data and Cosine (sensitivity 90.4%, PPV 93.7%) for Tayside. For people aged ≥65 years, the best methods were Jensen-Shannon (sensitivity 91.5%, PPV 98.7%) for Fife and City Block (sensitivity 94.4%, PPV 98.3%) for Tayside. These results show the feasibility of applying NLP methods to real data concluding that computing address similarities outperforms previous works.

Conclusions: Address-matching techniques using NLP methods can determine with reasonable accuracy if individuals live in a care home based on their GP-registered addresses. The performance of the system exceeds previously reported results such as Postcode matching, Markov score or Phonics score.

References
1.
Housley G, Lewis S, Usman A, Gordon A, Shaw D . Accurate identification of hospital admissions from care homes; development and validation of an automated algorithm. Age Ageing. 2017; 47(3):387-391. PMC: 5920300. DOI: 10.1093/ageing/afx182. View

2.
Emmerson C, Adamson J, Turner D, Gravenor M, Salmon J, Cottrell S . Risk factors for outbreaks of COVID-19 in care homes following hospital discharge: A national cohort analysis. Influenza Other Respir Viruses. 2021; 15(3):371-380. PMC: 8013658. DOI: 10.1111/irv.12831. View

3.
Zhang H, Casey A, Guellil I, Suarez-Paniagua V, MacRae C, Marwick C . : a framework for linking free-text addresses to the Ordnance Survey Unique Property Reference Number database. Front Digit Health. 2023; 5:1186208. PMC: 10715280. DOI: 10.3389/fdgth.2023.1186208. View

4.
Moore D, Hanratty B . Out of sight, out of mind? a review of data available on the health of care home residents in longitudinal and nationally representative cross-sectional studies in the UK and Ireland. Age Ageing. 2013; 42(6):798-803. DOI: 10.1093/ageing/aft125. View

5.
Santos F, Conti S, Wolters A . A novel method for identifying care home residents in England: a validation study. Int J Popul Data Sci. 2021; 5(4):1666. PMC: 8441962. DOI: 10.23889/ijpds.v5i4.1666. View