» Articles » PMID: 29940927

Identification of Research Hypotheses and New Knowledge from Scientific Literature

Overview
Publisher Biomed Central
Date 2018 Jun 27
PMID 29940927
Citations 14
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Text mining (TM) methods have been used extensively to extract relations and events from the literature. In addition, TM techniques have been used to extract various types or dimensions of interpretative information, known as Meta-Knowledge (MK), from the context of relations and events, e.g. negation, speculation, certainty and knowledge type. However, most existing methods have focussed on the extraction of individual dimensions of MK, without investigating how they can be combined to obtain even richer contextual information. In this paper, we describe a novel, supervised method to extract new MK dimensions that encode Research Hypotheses (an author's intended knowledge gain) and New Knowledge (an author's findings). The method incorporates various features, including a combination of simple MK dimensions.

Methods: We identify previously explored dimensions and then use a random forest to combine these with linguistic features into a classification model. To facilitate evaluation of the model, we have enriched two existing corpora annotated with relations and events, i.e., a subset of the GENIA-MK corpus and the EU-ADR corpus, by adding attributes to encode whether each relation or event corresponds to Research Hypothesis or New Knowledge. In the GENIA-MK corpus, these new attributes complement simpler MK dimensions that had previously been annotated.

Results: We show that our approach is able to assign different types of MK dimensions to relations and events with a high degree of accuracy. Firstly, our method is able to improve upon the previously reported state of the art performance for an existing dimension, i.e., Knowledge Type. Secondly, we also demonstrate high F1-score in predicting the new dimensions of Research Hypothesis (GENIA: 0.914, EU-ADR 0.802) and New Knowledge (GENIA: 0.829, EU-ADR 0.836).

Conclusion: We have presented a novel approach for predicting New Knowledge and Research Hypothesis, which combines simple MK dimensions to achieve high F1-scores. The extraction of such information is valuable for a number of practical TM applications.

Citing Articles

SATS: simplification aware text summarization of scientific documents.

Zaman F, Kamiran F, Shardlow M, Hassan S, Karim A, Aljohani N Front Artif Intell. 2024; 7:1375419.

PMID: 39049961 PMC: 11266102. DOI: 10.3389/frai.2024.1375419.


Dissecting Through the Literature: A Review of the Critical Appraisal Process.

Almutairi R, Alsarraf A, Alkandari D, Ashkanani H, Albazali A Cureus. 2024; 16(5):e59658.

PMID: 38836144 PMC: 11148477. DOI: 10.7759/cureus.59658.


Creating an ignorance-base: Exploring known unknowns in the scientific literature.

Boguslav M, Salem N, White E, Sullivan K, Bada M, Hernandez T J Biomed Inform. 2023; 143:104405.

PMID: 37270143 PMC: 10528083. DOI: 10.1016/j.jbi.2023.104405.


A survey on clinical natural language processing in the United Kingdom from 2007 to 2022.

Wu H, Wang M, Wu J, Francis F, Chang Y, Shavick A NPJ Digit Med. 2022; 5(1):186.

PMID: 36544046 PMC: 9770568. DOI: 10.1038/s41746-022-00730-6.


Contexts and contradictions: a roadmap for computational drug repurposing with knowledge inference.

Sosa D, Altman R Brief Bioinform. 2022; 23(4).

PMID: 35817308 PMC: 9294417. DOI: 10.1093/bib/bbac268.


References
1.
Kim J, Ohta T, Tsujii J . Corpus annotation for mining biomedical events from literature. BMC Bioinformatics. 2008; 9:10. PMC: 2267702. DOI: 10.1186/1471-2105-9-10. View

2.
Pyysalo S, Ginter F, Heimonen J, Bjorne J, Boberg J, Jarvinen J . BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinformatics. 2007; 8:50. PMC: 1808065. DOI: 10.1186/1471-2105-8-50. View

3.
Liakata M, Saha S, Dobnik S, Batchelor C, Rebholz-Schuhmann D . Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics. 2012; 28(7):991-1000. PMC: 3315721. DOI: 10.1093/bioinformatics/bts071. View

4.
Schuemie M, Weeber M, Schijvenaars B, van Mulligen E, van der Eijk C, Jelier R . Distribution of information in biomedical abstracts and full-text publications. Bioinformatics. 2004; 20(16):2597-604. DOI: 10.1093/bioinformatics/bth291. View

5.
Bravo A, Pinero J, Queralt-Rosinach N, Rautschka M, Furlong L . Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research. BMC Bioinformatics. 2015; 16:55. PMC: 4466840. DOI: 10.1186/s12859-015-0472-9. View