» Articles » PMID: 20595303

Evaluation of a Generalizable Approach to Clinical Information Retrieval Using the Automated Retrieval Console (ARC)

Overview
Date 2010 Jul 3
PMID 20595303
Citations 31
Authors
Affiliations
Soon will be listed here.
Abstract

Reducing custom software development effort is an important goal in information retrieval (IR). This study evaluated a generalizable approach involving with no custom software or rules development. The study used documents "consistent with cancer" to evaluate system performance in the domains of colorectal (CRC), prostate (PC), and lung (LC) cancer. Using an end-user-supplied reference set, the automated retrieval console (ARC) iteratively calculated performance of combinations of natural language processing-derived features and supervised classification algorithms. Training and testing involved 10-fold cross-validation for three sets of 500 documents each. Performance metrics included recall, precision, and F-measure. Annotation time for five physicians was also measured. Top performing algorithms had recall, precision, and F-measure values as follows: for CRC, 0.90, 0.92, and 0.89, respectively; for PC, 0.97, 0.95, and 0.94; and for LC, 0.76, 0.80, and 0.75. In all but one case, conditional random fields outperformed maximum entropy-based classifiers. Algorithms had good performance without custom code or rules development, but performance varied by specific application.

Citing Articles

Translational NLP: A New Paradigm and General Principles for Natural Language Processing Research.

Newman-Griffis D, Lehman J, Rose C, Hochheiser H Proc Conf. 2021; 2021:4125-4138.

PMID: 34179899 PMC: 8223521.


Automated NLP Extraction of Clinical Rationale for Treatment Discontinuation in Breast Cancer.

Alkaitis M, Agrawal M, Riely G, Razavi P, Sontag D JCO Clin Cancer Inform. 2021; 5:550-560.

PMID: 33989016 PMC: 8462597. DOI: 10.1200/CCI.20.00139.


Big data in IBD: big progress for clinical practice.

Sadat Seyed Tabib N, Madgwick M, Sudhakar P, Verstockt B, Korcsmaros T, Vermeire S Gut. 2020; 69(8):1520-1532.

PMID: 32111636 PMC: 7398484. DOI: 10.1136/gutjnl-2019-320065.


Test collections for electronic health record-based clinical information retrieval.

Wang Y, Wen A, Liu S, Hersh W, Bedrick S, Liu H JAMIA Open. 2019; 2(3):360-368.

PMID: 31709390 PMC: 6824517. DOI: 10.1093/jamiaopen/ooz016.


Extracting Healthcare Quality Information from Unstructured Data.

Malmasi S, Hosomura N, Chang L, Brown C, Skentzos S, Turchin A AMIA Annu Symp Proc. 2018; 2017:1243-1252.

PMID: 29854193 PMC: 5977624.


References
1.
Friedman C, Shagina L, Lussier Y, Hripcsak G . Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc. 2004; 11(5):392-402. PMC: 516246. DOI: 10.1197/jamia.M1552. View

2.
Aronson A . Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp. 2002; :17-21. PMC: 2243666. View

3.
Taira R, Soderland S . A statistical natural language processor for medical reports. Proc AMIA Symp. 1999; :970-4. PMC: 2232848. View

4.
Peabody J, Luck J, Jain S, Bertenthal D, Glassman P . Assessing the accuracy of administrative data in health information systems. Med Care. 2004; 42(11):1066-72. DOI: 10.1097/00005650-200411000-00005. View

5.
Uzuner O, Goldstein I, Luo Y, Kohane I . Identifying patient smoking status from medical discharge records. J Am Med Inform Assoc. 2007; 15(1):14-24. PMC: 2274873. DOI: 10.1197/jamia.M2408. View