» Articles » PMID: 38264714

LATTE: Label-efficient Incident Phenotyping from Longitudinal Electronic Health Records

Abstract

Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning. LATTE models the sequential dependency between the target event and visit embeddings to derive the timings. To improve label efficiency, LATTE constructs longitudinal silver-standard labels from unlabeled patients to perform semi-supervised training. LATTE is evaluated on the onset of type 2 diabetes, heart failure, and relapses of multiple sclerosis. LATTE consistently achieves substantial improvements over benchmark methods while providing high prediction interpretability. The event timings are shown to help discover risk factors of heart failure among patients with rheumatoid arthritis.

Citing Articles

With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de-identified electronic health record data for research.

Olaker V, Fry S, Terebuh P, Davis P, Tisch D, Xu R Clin Transl Sci. 2024; 18(1):e70093.

PMID: 39740190 PMC: 11685181. DOI: 10.1111/cts.70093.


LATTE: Label-efficient incident phenotyping from longitudinal electronic health records.

Wen J, Hou J, Bonzel C, Zhao Y, Castro V, Gainer V Patterns (N Y). 2024; 5(1):100906.

PMID: 38264714 PMC: 10801250. DOI: 10.1016/j.patter.2023.100906.

References
1.
Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H . BioGPT: generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform. 2022; 23(6). DOI: 10.1093/bib/bbac409. View

2.
Wen J, Hou J, Bonzel C, Zhao Y, Castro V, Gainer V . LATTE: Label-efficient incident phenotyping from longitudinal electronic health records. Patterns (N Y). 2024; 5(1):100906. PMC: 10801250. DOI: 10.1016/j.patter.2023.100906. View

3.
Zhou D, Gan Z, Shi X, Patwari A, Rush E, Bonzel C . Multiview Incomplete Knowledge Graph Integration with application to cross-institutional EHR data harmonization. J Biomed Inform. 2022; 133:104147. DOI: 10.1016/j.jbi.2022.104147. View

4.
Wanyan T, Honarvar H, Jaladanki S, Zang C, Naik N, Somani S . Contrastive learning improves critical event prediction in COVID-19 patients. Patterns (N Y). 2021; 2(12):100389. PMC: 8542449. DOI: 10.1016/j.patter.2021.100389. View

5.
Kirby J, Speltz P, Rasmussen L, Basford M, Gottesman O, Peissig P . PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc. 2016; 23(6):1046-1052. PMC: 5070514. DOI: 10.1093/jamia/ocv202. View