» Articles » PMID: 30305770

A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature

Overview
Date 2018 Oct 12
PMID 30305770
Citations 52
Authors
Affiliations
Soon will be listed here.
Abstract

We present a corpus of 5,000 richly annotated abstracts of medical articles describing clinical randomized controlled trials. Annotations include demarcations of text spans that describe the Patient population enrolled, the Interventions studied and to what they were Compared, and the Outcomes measured (the 'PICO' elements). These spans are further annotated at a more granular level, e.g., individual interventions within them are marked and mapped onto a structured medical vocabulary. We acquired annotations from a diverse set of workers with varying levels of expertise and cost. We describe our data collection process and the corpus itself in detail. We then outline a set of challenging NLP tasks that would aid searching of the medical literature and the practice of evidence-based medicine.

Citing Articles

SPIRIT-CONSORT-TM: a corpus for assessing transparency of clinical trial protocol and results publications.

Jiang L, Vorland C, Ying X, Brown A, Menke J, Hong G Sci Data. 2025; 12(1):355.

PMID: 40021657 PMC: 11871027. DOI: 10.1038/s41597-025-04629-1.


SPIRIT-CONSORT-TM: a corpus for assessing transparency of clinical trial protocol and results publications.

Jiang L, Vorland C, Ying X, Brown A, Menke J, Hong G medRxiv. 2025; .

PMID: 39867389 PMC: 11759256. DOI: 10.1101/2025.01.14.25320543.


Semi-supervised learning from small annotated data and large unlabeled data for fine-grained Participants, Intervention, Comparison, and Outcomes entity recognition.

Chen F, Zhang G, Fang Y, Peng Y, Weng C J Am Med Inform Assoc. 2025; 32(3):555-565.

PMID: 39823371 PMC: 11833487. DOI: 10.1093/jamia/ocae326.


Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations.

Wang L, Otmakhova Y, DeYoung J, Truong T, Kuehl B, Bransom E Proc Conf Assoc Comput Linguist Meet. 2024; 2023:9871-9889.

PMID: 39629493 PMC: 11613456. DOI: 10.18653/v1/2023.acl-long.549.


Automated Mass Extraction of Over 680,000 PICOs from Clinical Study Abstracts Using Generative AI: A Proof-of-Concept Study.

Reason T, Langham J, Gimblett A Pharmaceut Med. 2024; 38(5):365-372.

PMID: 39327389 PMC: 11473607. DOI: 10.1007/s40290-024-00539-6.


References
1.
Kiritchenko S, De Bruijn B, Carini S, Martin J, Sim I . ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Med Inform Decis Mak. 2010; 10:56. PMC: 2954855. DOI: 10.1186/1472-6947-10-56. View

2.
Chung G . Sentence retrieval for abstracts of randomized controlled trials. BMC Med Inform Decis Mak. 2009; 9:10. PMC: 2657779. DOI: 10.1186/1472-6947-9-10. View

3.
Lu Z, Kim W, Wilbur W . Evaluation of Query Expansion Using MeSH in PubMed. Inf Retr Boston. 2009; 12(1):69-80. PMC: 2747526. DOI: 10.1007/s10791-008-9074-8. View

4.
Wallace B, Noel-Storr A, Marshall I, Cohen A, Smalheiser N, Thomas J . Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach. J Am Med Inform Assoc. 2017; 24(6):1165-1168. PMC: 5975623. DOI: 10.1093/jamia/ocx053. View

5.
Tsafnat G, Dunn A, Glasziou P, Coiera E . The automation of systematic reviews. BMJ. 2013; 346:f139. DOI: 10.1136/bmj.f139. View