» Articles » PMID: 37001040

Developing a Data and Analytics Platform to Enable a Breast Cancer Learning Health System at a Regional Cancer Center

Abstract

Purpose: This study documents the creation of automated, longitudinal, and prospective data and analytics platform for breast cancer at a regional cancer center. This platform combines principles of data warehousing with natural language processing (NLP) to provide the integrated, timely, meaningful, high-quality, and actionable data required to establish a learning health system.

Methods: Data from six hospital information systems and one external data source were integrated on a nightly basis by automated extract/transform/load jobs. Free-text clinical documentation was processed using a commercial NLP engine.

Results: The platform contains 141 data elements of 7,019 patients with newly diagnosed breast cancer who received care at our regional cancer center from January 1, 2014, to June 3, 2022. Daily updating of the database takes an average of 56 minutes. Evaluation of the tuning of NLP jobs found overall high performance, with an F1 of 1.0 for 19 variables, with a further 16 variables with an F1 of > 0.95.

Conclusion: This study describes how data warehousing combined with NLP can be used to create a prospective data and analytics platform to enable a learning health system. Although upfront time investment required to create the platform was considerable, now that it has been developed, daily data processing is completed automatically in less than an hour.

Citing Articles

Breast cancer learning health system: Patient information from a data and analytics platform characterizes care provided.

Levine M, Kemppainen J, Rosenberg M, Pettengell C, Bogach J, Whelan T Learn Health Syst. 2024; 8(3):e10409.

PMID: 39036532 PMC: 11257056. DOI: 10.1002/lrh2.10409.


StatiCAL: an interactive tool for statistical analysis of biomedical data and scientific valorization.

Pace-Loscos T, Gal J, Contu S, Schiappa R, Chamorey E, Culie D BMC Bioinformatics. 2024; 25(1):210.

PMID: 38867185 PMC: 11167775. DOI: 10.1186/s12859-024-05829-z.


Real-World Treatment Patterns and Clinical Outcomes among Patients Receiving CDK4/6 Inhibitors for Metastatic Breast Cancer in a Canadian Setting Using AI-Extracted Data.

Moulson R, Feugere G, Moreira-Lucas T, Dequen F, Weiss J, Smith J Curr Oncol. 2024; 31(4):2172-2184.

PMID: 38668064 PMC: 11049664. DOI: 10.3390/curroncol31040161.


Real-World Outcomes of Patients with Advanced Epidermal Growth Factor Receptor-Mutated Non-Small Cell Lung Cancer in Canada Using Data Extracted by Large Language Model-Based Artificial Intelligence.

Moulson R, Law J, Sacher A, Liu G, Shepherd F, Bradbury P Curr Oncol. 2024; 31(4):1947-1960.

PMID: 38668049 PMC: 11049467. DOI: 10.3390/curroncol31040146.

References
1.
Boehm K, Khosravi P, Vanguri R, Gao J, Shah S . Harnessing multimodal data integration to advance precision oncology. Nat Rev Cancer. 2021; 22(2):114-126. PMC: 8810682. DOI: 10.1038/s41568-021-00408-3. View

2.
Hazlehurst B, Kurtz S, Masica A, Stevens V, McBurnie M, Puro J . CER Hub: An informatics platform for conducting comparative effectiveness research using multi-institutional, heterogeneous, electronic clinical data. Int J Med Inform. 2015; 84(10):763-73. DOI: 10.1016/j.ijmedinf.2015.06.002. View

3.
Shepheard J . Clinical coding and the quality and integrity of health data. Health Inf Manag. 2019; 49(1):3-4. DOI: 10.1177/1833358319874008. View

4.
Seneviratne M, Seto T, Blayney D, Brooks J, Hernandez-Boussard T . Architecture and Implementation of a Clinical Research Data Warehouse for Prostate Cancer. EGEMS (Wash DC). 2018; 6(1):13. PMC: 6078122. DOI: 10.5334/egems.234. View

5.
Petch J, Batt J, Murray J, Mamdani M . Extracting Clinical Features From Dictated Ambulatory Consult Notes Using a Commercially Available Natural Language Processing Tool: Pilot, Retrospective, Cross-Sectional Validation Study. JMIR Med Inform. 2019; 7(4):e12575. PMC: 6913750. DOI: 10.2196/12575. View