» Articles » PMID: 28365731

Strategies Towards Digital and Semi-automated Curation in RegulonDB

Overview
Specialty Biology
Date 2017 Apr 3
PMID 28365731
Citations 3
Authors
Affiliations
Soon will be listed here.
Abstract

Experimentally generated biological information needs to be organized and structured in order to become meaningful knowledge. However, the rate at which new information is being published makes manual curation increasingly unable to cope. Devising new curation strategies that leverage upon data mining and text analysis is, therefore, a promising avenue to help life science databases to cope with the deluge of novel information. In this article, we describe the integration of text mining technologies in the curation pipeline of the RegulonDB database, and discuss how the process can enhance the productivity of the curators. Specifically, a named entity recognition approach is used to pre-annotate terms referring to a set of domain entities which are potentially relevant for the curation process. The annotated documents are presented to the curator, who, thanks to a custom-designed interface, can select sentences containing specific types of entities, thus restricting the amount of text that needs to be inspected. Additionally, a module capable of computing semantic similarity between sentences across the entire collection of articles to be curated is being integrated in the system. We tested the module using three sets of scientific articles and six domain experts. All these improvements are gradually enabling us to obtain a high throughput curation process with the same quality as manual curation.

Citing Articles

BioWordVec, improving biomedical word embeddings with subword information and MeSH.

Zhang Y, Chen Q, Yang Z, Lin H, Lu Z Sci Data. 2019; 6(1):52.

PMID: 31076572 PMC: 6510737. DOI: 10.1038/s41597-019-0055-0.


Accelerating annotation of articles via automated approaches: evaluation of the neXtA5 curation-support tool by neXtProt.

Britan A, Cusin I, Hinard V, Mottin L, Pasche E, Gobeill J Database (Oxford). 2018; 2018.

PMID: 30576492 PMC: 6301339. DOI: 10.1093/database/bay129.


Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature.

Muller H, Van Auken K, Li Y, Sternberg P BMC Bioinformatics. 2018; 19(1):94.

PMID: 29523070 PMC: 5845379. DOI: 10.1186/s12859-018-2103-8.

References
1.
Zheng M, Doan B, Schneider T, Storz G . OxyR and SoxRS regulation of fur. J Bacteriol. 1999; 181(15):4639-43. PMC: 103597. DOI: 10.1128/JB.181.15.4639-4643.1999. View

2.
Zheng M, Wang X, Templeton L, Smulski D, LaRossa R, Storz G . DNA microarray-mediated transcriptional profiling of the Escherichia coli response to hydrogen peroxide. J Bacteriol. 2001; 183(15):4562-70. PMC: 95351. DOI: 10.1128/JB.183.15.4562-4570.2001. View

3.
Klein T, Chang J, Cho M, Easton K, Fergerson R, Hewett M . Integrating genotype and phenotype information: an overview of the PharmGKB project. Pharmacogenetics Research Network and Knowledge Base. Pharmacogenomics J. 2002; 1(3):167-70. DOI: 10.1038/sj.tpj.6500035. View

4.
Wallecha A, Correnti J, Munster V, van der Woude M . Phase variation of Ag43 is independent of the oxidation state of OxyR. J Bacteriol. 2003; 185(7):2203-9. PMC: 151510. DOI: 10.1128/JB.185.7.2203-2209.2003. View

5.
Rinaldi F, Schneider G, Kaljurand K, Hess M, Romacker M . An environment for relation mining over richly annotated corpora: the case of GENIA. BMC Bioinformatics. 2006; 7 Suppl 3:S3. PMC: 1764447. DOI: 10.1186/1471-2105-7-S3-S3. View