Community Challenges in Biomedical Text Mining over 10 Years: Success, Failure and the Future

Overview

Journal Brief Bioinform

Publisher Oxford University Press

Specialty Biology

Date 2015 May 4

PMID 25935162

Citations 83

Authors

Chung-Chi Huang

Zhiyong Lu

Affiliations

Soon will be listed here.

Abstract

One effective way to improve the state of the art is through competitions. Following the success of the Critical Assessment of protein Structure Prediction (CASP) in bioinformatics research, a number of challenge evaluations have been organized by the text-mining research community to assess and advance natural language processing (NLP) research for biomedicine. In this article, we review the different community challenge evaluations held from 2002 to 2014 and their respective tasks. Furthermore, we examine these challenge tasks through their targeted problems in NLP research and biomedical applications, respectively. Next, we describe the general workflow of organizing a Biomedical NLP (BioNLP) challenge and involved stakeholders (task organizers, task data producers, task participants and end users). Finally, we summarize the impact and contributions by taking into account different BioNLP challenges as a whole, followed by a discussion of their limitations and difficulties. We conclude with future trends in BioNLP challenge evaluations.

Citing Articles

Automatic extraction of transcriptional regulatory interactions of bacteria from biomedical literature using a BERT-based approach.

Varela-Vega A, Posada-Reyes A, Mendez-Cruz C Database (Oxford). 2024; 2024.

PMID: 39213391 PMC: 11363960. DOI: 10.1093/database/baae094.

Multi-head CRF classifier for biomedical multi-class named entity recognition on Spanish clinical notes.

Jonker R, Almeida T, Antunes R, Almeida J, Matos S Database (Oxford). 2024; 2024.

PMID: 39083461 PMC: 11290360. DOI: 10.1093/database/baae068.

Identifying symptom etiologies using syntactic patterns and large language models.

Taub-Tabib H, Shamay Y, Shlain M, Pinhasov M, Polak M, Tiktinsky A Sci Rep. 2024; 14(1):16190.

PMID: 39003296 PMC: 11246441. DOI: 10.1038/s41598-024-65645-6.

DUVEL: an active-learning annotated biomedical corpus for the recognition of oligogenic combinations.

Nachtegael C, De Stefani J, Cnudde A, Lenaerts T Database (Oxford). 2024; 2024.

PMID: 38805753 PMC: 11131422. DOI: 10.1093/database/baae039.

MetaTron: advancing biomedical annotation empowering relation annotation and collaboration.

Irrera O, Marchesin S, Silvello G BMC Bioinformatics. 2024; 25(1):112.

PMID: 38486137 PMC: 10941452. DOI: 10.1186/s12859-024-05730-9.

References

Rebholz-Schuhmann D, Jimeno Yepes A, Li C, Kafkas S, Lewin I, Kang N . Assessment of NER solutions against the first and second CALBC Silver Standard Corpus. J Biomed Semantics. 2011; 2 Suppl 5:S11. PMC: 3239301. DOI: 10.1186/2041-1480-2-S5-S11. View

Arighi C, Wu C, Cohen K, Hirschman L, Krallinger M, Valencia A . BioCreative-IV virtual issue. Database (Oxford). 2014; 2014. PMC: 4030502. DOI: 10.1093/database/bau039. View

Burger J, Doughty E, Khare R, Wei C, Mishra R, Aberdeen J . Hybrid curation of gene-mutation relations combining automated extraction and crowdsourcing. Database (Oxford). 2014; 2014. PMC: 4170591. DOI: 10.1093/database/bau094. View

Magrane M . UniProt Knowledgebase: a hub of integrated protein data. Database (Oxford). 2011; 2011:bar009. PMC: 3070428. DOI: 10.1093/database/bar009. View

Khare R, Wei C, Mao Y, Leaman R, Lu Z . tmBioC: improving interoperability of text-mining tools with BioC. Database (Oxford). 2014; 2014. PMC: 4110697. DOI: 10.1093/database/bau073. View

Neveol A, Islamaj Dogan R, Lu Z . Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction. J Biomed Inform. 2010; 44(2):310-8. PMC: 3063330. DOI: 10.1016/j.jbi.2010.11.001. View

Ruch P, Baud R, Rassinoux A, Bouillon P, Robert G . Medical document anonymization with a semantic lexicon. Proc AMIA Symp. 2000; :729-33. PMC: 2244050. View

Salgado D, Krallinger M, Depaule M, Drula E, Tendulkar A, Leitner F . MyMiner: a web application for computer-assisted biocuration and text annotation. Bioinformatics. 2012; 28(17):2285-7. DOI: 10.1093/bioinformatics/bts435. View

Moult J, Pedersen J, Judson R, Fidelis K . A large-scale experiment to assess protein structure prediction methods. Proteins. 1995; 23(3):ii-v. DOI: 10.1002/prot.340230303. View

10.

Hirschman L, Yeh A, Blaschke C, Valencia A . Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics. 2005; 6 Suppl 1:S1. PMC: 1869002. DOI: 10.1186/1471-2105-6-S1-S1. View

11.

Lu Z, Cohen K, Hunter L . GeneRIF quality assurance as summary revision. Pac Symp Biocomput. 2007; :269-80. PMC: 2652871. DOI: 10.1142/9789812772435_0026. View

12.

Van Auken K, Schaeffer M, McQuilton P, Laulederkind S, Li D, Wang S . BC4GO: a full-text corpus for the BioCreative IV GO task. Database (Oxford). 2014; 2014. PMC: 4112614. DOI: 10.1093/database/bau074. View

13.

Krallinger M, Leitner F, Rodriguez-Penagos C, Valencia A . Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biol. 2008; 9 Suppl 2:S4. PMC: 2559988. DOI: 10.1186/gb-2008-9-s2-s4. View

14.

Sun W, Rumshisky A, Uzuner O . Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. J Am Med Inform Assoc. 2013; 20(5):806-13. PMC: 3756273. DOI: 10.1136/amiajnl-2013-001628. View

15.

Maglott D, Ostell J, Pruitt K, Tatusova T . Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 2004; 33(Database issue):D54-8. PMC: 539985. DOI: 10.1093/nar/gki031. View

16.

. The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species. PLoS Comput Biol. 2009; 5(7):e1000431. PMC: 2699109. DOI: 10.1371/journal.pcbi.1000431. View

17.

Pyysalo S, Ohta T, Rak R, Sullivan D, Mao C, Wang C . Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011. BMC Bioinformatics. 2012; 13 Suppl 11:S2. PMC: 3384257. DOI: 10.1186/1471-2105-13-S11-S2. View

18.

Bossy R, Jourde J, Manine A, Veber P, Alphonse E, van de Guchte M . BioNLP Shared Task--The Bacteria Track. BMC Bioinformatics. 2012; 13 Suppl 11:S3. PMC: 3384254. DOI: 10.1186/1471-2105-13-S11-S3. View

19.

Kim J, Nguyen N, Wang Y, Tsujii J, Takagi T, Yonezawa A . The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011. BMC Bioinformatics. 2012; 13 Suppl 11:S1. PMC: 3384256. DOI: 10.1186/1471-2105-13-S11-S1. View

20.

Ravikumar K, Liu H, Cohn J, Wall M, Verspoor K . Literature mining of protein-residue associations with graph rules learned through distant supervision. J Biomed Semantics. 2012; 3 Suppl 3:S2. PMC: 3465209. DOI: 10.1186/2041-1480-3-S3-S2. View