Clinical Annotations for Prostate Cancer Research: Defining Data Elements, Creating a Reproducible Analytical Pipeline, and Assessing Data Quality
Overview
Authors
Affiliations
Background: Routine clinical data from clinical charts are indispensable for retrospective and prospective observational studies and clinical trials. Their reproducibility is often not assessed. We developed a prostate cancer-specific database for clinical annotations and evaluated data reproducibility.
Methods: For men with prostate cancer who had clinical-grade paired tumor-normal sequencing at a comprehensive cancer center, we performed team-based retrospective data collection from the electronic medical record using a defined source hierarchy. We developed an open-source R package for data processing. With blinded repeat annotation by a reference medical oncologist, we assessed data completeness, reproducibility of team-based annotations, and impact of measurement error on bias in survival analyses.
Results: Data elements on demographics, diagnosis and staging, disease state at the time of procuring a genomically characterized sample, and clinical outcomes were piloted and then abstracted for 2261 patients (with 2631 samples). Completeness of data elements was generally high. Comparing to the repeat annotation by a medical oncologist blinded to the database (100 patients/samples), reproducibility of annotations was high; T stage, metastasis date, and presence and date of castration resistance had lower reproducibility. Impact of measurement error on estimates for strong prognostic factors was modest.
Conclusions: With a prostate cancer-specific data dictionary and quality control measures, manual clinical annotations by a multidisciplinary team can be scalable and reproducible. The data dictionary and the R package for reproducible data processing are freely available to increase data quality and efficiency in clinical prostate cancer research.
Automated real-world data integration improves cancer outcome prediction.
Jee J, Fong C, Pichotta K, Tran T, Luthra A, Waters M Nature. 2024; 636(8043):728-736.
PMID: 39506116 PMC: 11655358. DOI: 10.1038/s41586-024-08167-5.
promotes oncogenesis and lethal progression of prostate cancer.
Su X, Stopsack K, Schmidt D, Ma D, Li Z, Scheet P Proc Natl Acad Sci U S A. 2024; 121(36):e2405543121.
PMID: 39190349 PMC: 11388324. DOI: 10.1073/pnas.2405543121.
Lenis A, Ravichandran V, Brown S, Alam S, Katims A, Truong H Clin Cancer Res. 2024; 30(17):3894-3903.
PMID: 38949888 PMC: 11371520. DOI: 10.1158/1078-0432.CCR-23-3403.
Moreno A, Solanki A, Xu T, Lin R, Palta J, Daugherty E Cancers (Basel). 2023; 15(12).
PMID: 37370731 PMC: 10295832. DOI: 10.3390/cancers15123121.
The Impact of PIK3R1 Mutations and Insulin-PI3K-Glycolytic Pathway Regulation in Prostate Cancer.
Chakraborty G, Nandakumar S, Hirani R, Nguyen B, Stopsack K, Kreitzer C Clin Cancer Res. 2022; 28(16):3603-3617.
PMID: 35670774 PMC: 9438279. DOI: 10.1158/1078-0432.CCR-21-4272.