» Articles » PMID: 35799294

Classifying the Lifestyle Status for Alzheimer's Disease from Clinical Notes Using Deep Learning with Weak Supervision

Overview
Publisher Biomed Central
Date 2022 Jul 7
PMID 35799294
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Since no effective therapies exist for Alzheimer's disease (AD), prevention has become more critical through lifestyle status changes and interventions. Analyzing electronic health records (EHRs) of patients with AD can help us better understand lifestyle's effect on AD. However, lifestyle information is typically stored in clinical narratives. Thus, the objective of the study was to compare different natural language processing (NLP) models on classifying the lifestyle statuses (e.g., physical activity and excessive diet) from clinical texts in English.

Methods: Based on the collected concept unique identifiers (CUIs) associated with the lifestyle status, we extracted all related EHRs for patients with AD from the Clinical Data Repository (CDR) of the University of Minnesota (UMN). We automatically generated labels for the training data by using a rule-based NLP algorithm. We conducted weak supervision for pre-trained Bidirectional Encoder Representations from Transformers (BERT) models and three traditional machine learning models as baseline models on the weakly labeled training corpus. These models include the BERT base model, PubMedBERT (abstracts + full text), PubMedBERT (only abstracts), Unified Medical Language System (UMLS) BERT, Bio BERT, Bio-clinical BERT, logistic regression, support vector machine, and random forest. The rule-based model used for weak supervision was tested on the GSC for comparison. We performed two case studies: physical activity and excessive diet, in order to validate the effectiveness of BERT models in classifying lifestyle status for all models were evaluated and compared on the developed Gold Standard Corpus (GSC) on the two case studies.

Results: The UMLS BERT model achieved the best performance for classifying status of physical activity, with its precision, recall, and F-1 scores of 0.93, 0.93, and 0.92, respectively. Regarding classifying excessive diet, the Bio-clinical BERT model showed the best performance with precision, recall, and F-1 scores of 0.93, 0.93, and 0.93, respectively.

Conclusion: The proposed approach leveraging weak supervision could significantly increase the sample size, which is required for training the deep learning models. By comparing with the traditional machine learning models, the study also demonstrates the high performance of BERT models for classifying lifestyle status for Alzheimer's disease in clinical notes.

Citing Articles

Leveraging large language models for knowledge-free weak supervision in clinical natural language processing.

Hsu E, Roberts K Sci Rep. 2025; 15(1):8241.

PMID: 40064991 PMC: 11893743. DOI: 10.1038/s41598-024-68168-2.


Natural language processing in Alzheimer's disease research: Systematic review of methods, data, and efficacy.

Shakeri A, Farmanbar M Alzheimers Dement (Amst). 2025; 17(1):e70082.

PMID: 39935888 PMC: 11812127. DOI: 10.1002/dad2.70082.


Leveraging Large Language Models for Knowledge-free Weak Supervision in Clinical Natural Language Processing.

Hsu E, Roberts K Res Sq. 2024; .

PMID: 38978609 PMC: 11230489. DOI: 10.21203/rs.3.rs-4559971/v1.


The Role of the Neural Exposome as a Novel Strategy to Identify and Mitigate Health Inequities in Alzheimer's Disease and Related Dementias.

Granov R, Vedad S, Wang S, Durham A, Shah D, Pasinetti G Mol Neurobiol. 2024; 62(1):1205-1224.

PMID: 38967905 PMC: 11711138. DOI: 10.1007/s12035-024-04339-6.


Retrieval-Based Diagnostic Decision Support: Mixed Methods Study.

Abdullahi T, Mercurio L, Singh R, Eickhoff C JMIR Med Inform. 2024; 12:e50209.

PMID: 38896468 PMC: 11222760. DOI: 10.2196/50209.


References
1.
Escudie J, Rance B, Malamut G, Khater S, Burgun A, Cellier C . A novel data-driven workflow combining literature and electronic health records to estimate comorbidities burden for a specific disease: a case study on autoimmune comorbidities in patients with celiac disease. BMC Med Inform Decis Mak. 2017; 17(1):140. PMC: 5622531. DOI: 10.1186/s12911-017-0537-y. View

2.
Blumenthal D . Launching HITECH. N Engl J Med. 2010; 362(5):382-5. DOI: 10.1056/NEJMp0912825. View

3.
Kivipelto M, Solomon A, Ahtiluoto S, Ngandu T, Lehtisalo J, Antikainen R . The Finnish Geriatric Intervention Study to Prevent Cognitive Impairment and Disability (FINGER): study design and progress. Alzheimers Dement. 2013; 9(6):657-65. DOI: 10.1016/j.jalz.2012.09.012. View

4.
Lee J, Yoon W, Kim S, Kim D, Kim S, So C . BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2019; 36(4):1234-1240. PMC: 7703786. DOI: 10.1093/bioinformatics/btz682. View

5.
Zhou X, Wang Y, Sohn S, Therneau T, Liu H, Knopman D . Automatic extraction and assessment of lifestyle exposures for Alzheimer's disease using natural language processing. Int J Med Inform. 2019; 130:103943. PMC: 6750723. DOI: 10.1016/j.ijmedinf.2019.08.003. View