» Articles » PMID: 20442151

The Disclosure of Diagnosis Codes Can Breach Research Participants' Privacy

Overview
Date 2010 May 6
PMID 20442151
Citations 55
Authors
Affiliations
Soon will be listed here.
Abstract

Objective: De-identified clinical data in standardized form (eg, diagnosis codes), derived from electronic medical records, are increasingly combined with research data (eg, DNA sequences) and disseminated to enable scientific investigations. This study examines whether released data can be linked with identified clinical records that are accessible via various resources to jeopardize patients' anonymity, and the ability of popular privacy protection methodologies to prevent such an attack.

Design: The study experimentally evaluates the re-identification risk of a de-identified sample of Vanderbilt's patient records involved in a genome-wide association study. It also measures the level of protection from re-identification, and data utility, provided by suppression and generalization.

Measurement: Privacy protection is quantified using the probability of re-identifying a patient in a larger population through diagnosis codes. Data utility is measured at a dataset level, using the percentage of retained information, as well as its description, and at a patient level, using two metrics based on the difference between the distribution of Internal Classification of Disease (ICD) version 9 codes before and after applying privacy protection.

Results: More than 96% of 2800 patients' records are shown to be uniquely identified by their diagnosis codes with respect to a population of 1.2 million patients. Generalization is shown to reduce further the percentage of de-identified records by less than 2%, and over 99% of the three-digit ICD-9 codes need to be suppressed to prevent re-identification.

Conclusions: Popular privacy protection methods are inadequate to deliver a sufficiently protected and useful result when sharing data derived from complex clinical systems. The development of alternative privacy protection models is thus required.

Citing Articles

Privacy protection of sexually transmitted infections information from Chinese electronic medical records.

Gong M, Yu Y, Ouyang Z, Shi W, Liu C, Wang Q Sci Rep. 2025; 15(1):1296.

PMID: 39779720 PMC: 11711325. DOI: 10.1038/s41598-024-84658-9.


Distributed non-disclosive validation of predictive models by a modified ROC-GLM.

Schalk D, Rehms R, Hoffmann V, Bischl B, Mansmann U BMC Med Res Methodol. 2024; 24(1):190.

PMID: 39210301 PMC: 11363434. DOI: 10.1186/s12874-024-02312-4.


An Equity-Based Scoring System for Evaluating Surveillance-Related Harm in Public Health Crises.

Amani B, McAndrew B, Sharif M, Garcia J, Nwankwo E, Cabral A Ethn Dis. 2024; 33(1):63-75.

PMID: 38846262 PMC: 11152151. DOI: 10.18865/2022-2022.


Who owns (or controls) health data?.

Kahn S, Terry S Sci Data. 2024; 11(1):156.

PMID: 38302466 PMC: 10834592. DOI: 10.1038/s41597-024-02982-1.


[Re-identification potential of structured health data].

Drechsler J, Pauly H Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2024; 67(2):164-170.

PMID: 38231225 PMC: 10834562. DOI: 10.1007/s00103-023-03820-2.


References
1.
Gurwitz D, Lunshof J, Altman R . A call for the creation of personalized medicine databases. Nat Rev Drug Discov. 2005; 5(1):23-6. DOI: 10.1038/nrd1931. View

2.
Malin B, Sweeney L . How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. J Biomed Inform. 2004; 37(3):179-92. DOI: 10.1016/j.jbi.2004.04.005. View

3.
Mailman M, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R . The NCBI dbGaP database of genotypes and phenotypes. Nat Genet. 2007; 39(10):1181-6. PMC: 2031016. DOI: 10.1038/ng1007-1181. View

4.
Malin B . A computational model to protect patient data from location-based re-identification. Artif Intell Med. 2007; 40(3):223-39. DOI: 10.1016/j.artmed.2007.04.002. View

5.
Lin Z, Owen A, Altman R . Genetics. Genomic research and human subject privacy. Science. 2004; 305(5681):183. DOI: 10.1126/science.1095019. View