» Articles » PMID: 21508311

Electronic Medical Records for Genetic Research: Results of the EMERGE Consortium

Abstract

Clinical data in electronic medical records (EMRs) are a potential source of longitudinal clinical data for research. The Electronic Medical Records and Genomics Network (eMERGE) investigates whether data captured through routine clinical care using EMRs can identify disease phenotypes with sufficient positive and negative predictive values for use in genome-wide association studies (GWAS). Using data from five different sets of EMRs, we have identified five disease phenotypes with positive predictive values of 73 to 98% and negative predictive values of 98 to 100%. Most EMRs captured key information (diagnoses, medications, laboratory tests) used to define phenotypes in a structured format. We identified natural language processing as an important tool to improve case identification rates. Efforts and incentives to increase the implementation of interoperable EMRs will markedly improve the availability of clinical data for genomics research.

Citing Articles

Instability of high polygenic risk classification and mitigation by integrative scoring.

Misra A, Truong B, Urbut S, Sui Y, Fahed A, Smoller J Nat Commun. 2025; 16(1):1584.

PMID: 39939586 PMC: 11822040. DOI: 10.1038/s41467-025-56945-0.


Evaluating dimensionality reduction of comorbidities for predictive modeling in individuals with neurofibromatosis type 1.

Gupta A, Hillis E, Oh I, Morris S, Abrams Z, Foraker R JAMIA Open. 2025; 8(1):ooae157.

PMID: 39845289 PMC: 11752863. DOI: 10.1093/jamiaopen/ooae157.


A One-Shot Lossless Algorithm for Cross-Cohort Learning in Mixed-Outcomes Analysis.

Li R, Benz L, Duan R, Denny J, Hakonarson H, Mosley J medRxiv. 2024; .

PMID: 38260403 PMC: 10802662. DOI: 10.1101/2024.01.09.24301073.


A methodology of phenotyping ICU patients from EHR data: High-fidelity, personalized, and interpretable phenotypes estimation.

Wang Y, Stroh J, Hripcsak G, Low Wang C, Bennett T, Wrobel J J Biomed Inform. 2023; 148:104547.

PMID: 37984547 PMC: 10802138. DOI: 10.1016/j.jbi.2023.104547.


Individual Data Protected Integrative Regression Analysis of High-Dimensional Heterogeneous Data.

Cai T, Liu M, Xia Y J Am Stat Assoc. 2023; 117(540):2105-2119.

PMID: 37975021 PMC: 10653033. DOI: 10.1080/01621459.2021.1904958.


References
1.
Blumenthal D, Tavenner M . The "meaningful use" regulation for electronic health records. N Engl J Med. 2010; 363(6):501-4. DOI: 10.1056/NEJMp1006114. View

2.
Larson E, Wang L, Bowen J, McCormick W, Teri L, Crane P . Exercise is associated with reduced risk for incident dementia among persons 65 years of age and older. Ann Intern Med. 2006; 144(2):73-81. DOI: 10.7326/0003-4819-144-2-200601170-00004. View

3.
Linder J, Ma J, Bates D, Middleton B, Stafford R . Electronic health record use and the quality of ambulatory care in the United States. Arch Intern Med. 2007; 167(13):1400-5. DOI: 10.1001/archinte.167.13.1400. View

4.
Murphy S, Churchill S, Bry L, Chueh H, Weiss S, Lazarus R . Instrumenting the health care enterprise for discovery research in the genomic era. Genome Res. 2009; 19(9):1675-81. PMC: 2752136. DOI: 10.1101/gr.094615.109. View

5.
Burke W, Psaty B . Personalized medicine in the era of genomics. JAMA. 2007; 298(14):1682-4. DOI: 10.1001/jama.298.14.1682. View