» Articles » PMID: 15222890

Some Methods for Blindfolded Record Linkage

Overview
Publisher Biomed Central
Date 2004 Jun 30
PMID 15222890
Citations 19
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The linkage of records which refer to the same entity in separate data collections is a common requirement in public health and biomedical research. Traditionally, record linkage techniques have required that all the identifying data in which links are sought be revealed to at least one party, often a third party. This necessarily invades personal privacy and requires complete trust in the intentions of that party and their ability to maintain security and confidentiality. Dusserre, Quantin, Bouzelat and colleagues have demonstrated that it is possible to use secure one-way hash transformations to carry out follow-up epidemiological studies without any party having to reveal identifying information about any of the subjects - a technique which we refer to as "blindfolded record linkage". A limitation of their method is that only exact comparisons of values are possible, although phonetic encoding of names and other strings can be used to allow for some types of typographical variation and data errors.

Methods: A method is described which permits the calculation of a general similarity measure, the n-gram score, without having to reveal the data being compared, albeit at some cost in computation and data communication. This method can be combined with public key cryptography and automatic estimation of linkage model parameters to create an overall system for blindfolded record linkage.

Results: The system described offers good protection against misdeeds or security failures by any one party, but remains vulnerable to collusion between or simultaneous compromise of two or more parties involved in the linkage operation. In order to reduce the likelihood of this, the use of last-minute allocation of tasks to substitutable servers is proposed. Proof-of-concept computer programmes written in the Python programming language are provided to illustrate the similarity comparison protocol.

Conclusion: Although the protocols described in this paper are not unconditionally secure, they do suggest the feasibility, with the aid of modern cryptographic techniques and high speed communication networks, of a general purpose probabilistic record linkage system which permits record linkage studies to be carried out with negligible risk of invasion of personal privacy.

Citing Articles

Accuracy of an Electronic Health Record Patient Linkage Module Evaluated between Neighboring Academic Health Care Centers.

Ross M, Sanz J, Tep B, Follett R, Soohoo S, Bell D Appl Clin Inform. 2020; 11(5):725-732.

PMID: 33147645 PMC: 7641664. DOI: 10.1055/s-0040-1718374.


Methodology for linking Ryan White HIV/AIDS Program Services Report (RSR) client level data over multiple years.

Zhu J, Fanning M, Sheehan L, Morrissey K, Legum S, Hermansen S PLoS One. 2020; 15(8):e0237635.

PMID: 32823269 PMC: 7442495. DOI: 10.1371/journal.pone.0237635.


Design and implementation of a privacy preserving electronic health record linkage tool in Chicago.

Kho A, Cashy J, Jackson K, Pah A, Goel S, Boehnke J J Am Med Inform Assoc. 2015; 22(5):1072-80.

PMID: 26104741 PMC: 5009931. DOI: 10.1093/jamia/ocv038.


Privacy preserving probabilistic record linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality.

Schmidlin K, Clough-Gorr K, Spoerri A BMC Med Res Methodol. 2015; 15:46.

PMID: 26024886 PMC: 4460842. DOI: 10.1186/s12874-015-0038-6.


SOEMPI: A Secure Open Enterprise Master Patient Index Software Toolkit for Private Record Linkage.

Toth C, Durham E, Kantarcioglu M, Xue Y, Malin B AMIA Annu Symp Proc. 2015; 2014:1105-14.

PMID: 25954421 PMC: 4419976.


References
1.
Armstrong B, Kricker A . Record linkage--a vision renewed. Aust N Z J Public Health. 1999; 23(5):451-2. DOI: 10.1111/j.1467-842x.1999.tb01296.x. View

2.
Newcombe H, Kennedy J, Axford S, James A . Automatic linkage of vital records. Science. 1959; 130(3381):954-9. DOI: 10.1126/science.130.3381.954. View

3.
Etheridge Y . PKI (public key infrastructure)--how and why it works. Health Manag Technol. 2001; 22(1):20-1. View

4.
Kelman C, Bass A, Holman C . Research use of linked health data--a best practice protocol. Aust N Z J Public Health. 2002; 26(3):251-5. DOI: 10.1111/j.1467-842x.2002.tb00682.x. View

5.
Churches T, Christen P, Lim K, Zhu J . Preparation of name and address data for record linkage using hidden Markov models. BMC Med Inform Decis Mak. 2002; 2:9. PMC: 140019. DOI: 10.1186/1472-6947-2-9. View