» Articles » PMID: 30533534

Demystifying Probabilistic Linkage: Common Myths and Misconceptions

Overview
Specialty Public Health
Date 2018 Dec 12
PMID 30533534
Citations 17
Authors
Affiliations
Soon will be listed here.
Abstract

Many of the distinctions made between probabilistic and deterministic linkage are misleading. While these two approaches to record linkage operate in different ways and can produce different outputs, the distinctions between them are more a result of they are implemented than because of any intrinsic differences. In the way they are generally applied, probabilistic and deterministic procedures can be little more than alternative means to similar ends-or they can arrive at very different ends depending on choices that are made during implementation. Misconceptions about probabilistic linkage contribute to reluctance for implementing it and mistrust of its outputs. We aim to explain how the outputs of either approach can be tailored to suit the intended application, but also to highlight the ways in which probabilistic linkage is generally more flexible, more powerful and more informed by the data. This is accomplished by examining common misconceptions about probabilistic linkage and its difference from deterministic linkage, highlighting the potential impact of design choices on the outputs of either approach. We hope that better understanding of linkage designs will help to allay concerns about probabilistic linkage, and help data linkers to select and tailor procedures to produce outputs that are appropriate for their intended use.

Citing Articles

Challenges and Opportunities in Big Data Science to Address Health Inequities and Focus the HIV Response.

Rucinski K, Knight J, Willis K, Wang L, Rao A, Roach M Curr HIV/AIDS Rep. 2024; 21(4):208-219.

PMID: 38916675 PMC: 11283392. DOI: 10.1007/s11904-024-00702-3.


Underreporting of unfavorable outcomes of congenital syphilis on the Notifiable Health Conditions Information System in the state of São Paulo, Brazil, 2007-2018.

Festa L, Prado M, Jesuino A, Balda R, Tayra A, Sanudo A Epidemiol Serv Saude. 2023; 32(2):e2022664.

PMID: 37466564 PMC: 10355990. DOI: 10.1590/S2237-96222023000200007.


Development of Indirect Health Data Linkage on Health Product Use and Care Trajectories in France: Systematic Review.

Ranchon F, Chanoine S, Lambert-Lacroix S, Bosson J, Moreau-Gaudry A, Bedouch P J Med Internet Res. 2023; 25:e41048.

PMID: 37200084 PMC: 10236279. DOI: 10.2196/41048.


Data linkage in medical research.

Harron K BMJ Med. 2023; 1(1):e000087.

PMID: 36936588 PMC: 9951373. DOI: 10.1136/bmjmed-2021-000087.


Linkage of multiple electronic health record datasets using a 'spine linkage' approach compared with all 'pairwise linkages'.

Blake H, Sharples L, Harron K, van der Meulen J, Walker K Int J Epidemiol. 2022; 52(1):214-226.

PMID: 35748342 PMC: 9908066. DOI: 10.1093/ije/dyac130.


References
1.
Gomatam S, Carter R, Ariet M, Mitchell G . An empirical comparison of record linkage procedures. Stat Med. 2002; 21(10):1485-96. DOI: 10.1002/sim.1147. View

2.
Clark D, Hahn D . Comparison of probabilistic and deterministic record linkage in the development of a statewide trauma registry. Proc Annu Symp Comput Appl Med Care. 1995; :397-401. PMC: 2579122. View

3.
Harron K, Gilbert R, Cromwell D, van der Meulen J . Linking Data for Mothers and Babies in De-Identified Electronic Health Data. PLoS One. 2016; 11(10):e0164667. PMC: 5072610. DOI: 10.1371/journal.pone.0164667. View

4.
Harron K, Wade A, Gilbert R, Muller-Pebody B, Goldstein H . Evaluating bias due to data linkage error in electronic healthcare records. BMC Med Res Methodol. 2014; 14:36. PMC: 4015706. DOI: 10.1186/1471-2288-14-36. View

5.
Boyd J, Ferrante A, OKeefe C, Bass A, Randall S, Semmens J . Data linkage infrastructure for cross-jurisdictional health-related research in Australia. BMC Health Serv Res. 2013; 12:480. PMC: 3579698. DOI: 10.1186/1472-6963-12-480. View