» Articles » PMID: 18693911

The CLEF Corpus: Semantic Annotation of Clinical Text

Overview
Date 2008 Aug 13
PMID 18693911
Citations 26
Authors
Affiliations
Soon will be listed here.
Abstract

The Clinical E-Science Framework (CLEF) project is building a framework for the capture, integration and presentation of clinical information: for clinical research, evidence-based health care and genotype-meets-phenotype informatics. A significant portion of the information required by such a framework originates as text, even in EHR-savvy organizations. CLEF uses Information Extraction (IE) to make this unstructured information available. An important part of IE is the identification of semantic entities and relationships. Typical approaches require human annotated documents to provide both evaluation standards and material for system development. CLEF has a corpus of clinical narratives, histopathology reports and imaging reports from 20 thousand patients. We describe the selection of a subset of this corpus for manual annotation of clinical entities and relationships. We describe an annotation methodology and report encouraging initial results of inter-annotator agreement. Comparisons are made between different text sub-genres, and between annotators with different skills.

Citing Articles

Annotation of epilepsy clinic letters for natural language processing.

Fonferko-Shadrach B, Strafford H, Jones C, Khan R, Brown S, Edwards J J Biomed Semantics. 2024; 15(1):17.

PMID: 39277770 PMC: 11402197. DOI: 10.1186/s13326-024-00316-z.


Text mining for disease surveillance in veterinary clinical data: part one, the language of veterinary clinical records and searching for words.

Davies H, Nenadic G, Alfattni G, Arguello Casteleiro M, Al Moubayed N, Farrell S Front Vet Sci. 2024; 11:1352239.

PMID: 38322169 PMC: 10844486. DOI: 10.3389/fvets.2024.1352239.


The h-ANN Model: Comprehensive Colonoscopy Concept Compilation Using Combined Contextual Embeddings.

Syed S, Angel A, Syeda H, Jennings C, VanScoy J, Syed M Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022; 5:189-200.

PMID: 35373222 PMC: 8970464. DOI: 10.5220/0010903300003123.


Reducing Physicians' Cognitive Load During Chart Review: A Problem-Oriented Summary of the Patient Electronic Record.

Liang J, Tsou C, Dandala B, Poddar A, Joopudi V, Mahajan D AMIA Annu Symp Proc. 2022; 2021:763-772.

PMID: 35308927 PMC: 8861663.


TAX-Corpus: Taxonomy based Annotations for Colonoscopy Evaluation.

Syed S, Angel A, Syeda H, Jennings C, VanScoy J, Syed M Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap. 2022; 2022:162-169.

PMID: 35300321 PMC: 8926426. DOI: 10.5220/0010876100003123.


References
1.
Kim J, Ohta T, Tateisi Y, Tsujii J . GENIA corpus--semantically annotated corpus for bio-textmining. Bioinformatics. 2003; 19 Suppl 1:i180-2. DOI: 10.1093/bioinformatics/btg1023. View

2.
Hripcsak G, Rothschild A . Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc. 2005; 12(3):296-8. PMC: 1090460. DOI: 10.1197/jamia.M1733. View

3.
Ogren P, Savova G, Buntrock J, Chute C . Building and evaluating annotated corpora for medical NLP systems. AMIA Annu Symp Proc. 2007; :1050. PMC: 1839264. View

4.
SAGER N, Lyman M, Bucknall C, Nhan N, TICK L . Natural language processing and the representation of clinical data. J Am Med Inform Assoc. 1994; 1(2):142-60. PMC: 116193. DOI: 10.1136/jamia.1994.95236145. View