» Articles » PMID: 26357075

RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information

Overview
Specialty Biology
Date 2015 Sep 11
PMID 26357075
Citations 27
Authors
Affiliations
Soon will be listed here.
Abstract

We introduce RLIMS-P version 2.0, an enhanced rule-based information extraction (IE) system for mining kinase, substrate, and phosphorylation site information from scientific literature. Consisting of natural language processing and IE modules, the system has integrated several new features, including the capability of processing full-text articles and generalizability towards different post-translational modifications (PTMs). To evaluate the system, sets of abstracts and full-text articles, containing a variety of textual expressions, were annotated. On the abstract corpus, the system achieved F-scores of 0.91, 0.92, and 0.95 for kinases, substrates, and sites, respectively. The corresponding scores on the full-text corpus were 0.88, 0.91, and 0.92. It was additionally evaluated on the corpus of the 2013 BioNLP-ST GE task, and achieved an F-score of 0.87 for the phosphorylation core task, improving upon the results previously reported on the corpus. Full-scale processing of all abstracts in MEDLINE and all articles in PubMed Central Open Access Subset has demonstrated scalability for mining rich information in literature, enabling its adoption for biocuration and for knowledge discovery. The new system is generalizable and it will be adapted to tackle other major PTM types. RLIMS-P 2.0 online system is available online (http://proteininformationresource.org/rlimsp/) and the developed corpora are available from iProLINK (http://proteininformationresource.org/iprolink/).

Citing Articles

Characterization and automated classification of sentences in the biomedical literature: a case study for biocuration of gene expression and protein kinase activity.

Raciti D, Van Auken K, Arnaboldi V, Tabone C, Muller H, Sternberg P bioRxiv. 2025; .

PMID: 39829858 PMC: 11741306. DOI: 10.1101/2025.01.06.631539.


KSFinder-a knowledge graph model for link prediction of novel phosphorylated substrates of kinases.

Anandakrishnan M, Ross K, Chen C, Shanker V, Cowart J, Wu C PeerJ. 2023; 11:e16164.

PMID: 37818330 PMC: 10561642. DOI: 10.7717/peerj.16164.


Automated assembly of molecular mechanisms at scale from text mining and curated databases.

Bachman J, Gyori B, Sorger P Mol Syst Biol. 2023; 19(5):e11325.

PMID: 36938926 PMC: 10167483. DOI: 10.15252/msb.202211325.


Identification of Novel Kinases of Tau Using Fluorescence Complementation Mass Spectrometry (FCMS).

Kao D, Du Y, DeMarco A, Min S, Hall M, Rochet J Mol Cell Proteomics. 2022; 21(12):100441.

PMID: 36379402 PMC: 9755369. DOI: 10.1016/j.mcpro.2022.100441.


Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts.

Nicholson D, Himmelstein D, Greene C BioData Min. 2022; 15(1):26.

PMID: 36258252 PMC: 9578183. DOI: 10.1186/s13040-022-00311-z.


References
1.
Cohen K, Johnson H, Verspoor K, Roeder C, Hunter L . The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinformatics. 2010; 11:492. PMC: 3098079. DOI: 10.1186/1471-2105-11-492. View

2.
Veuthey A, Bridge A, Gobeill J, Ruch P, McEntyre J, Bougueleret L . Application of text-mining for updating protein post-translational modification annotation in UniProtKB. BMC Bioinformatics. 2013; 14:104. PMC: 3660268. DOI: 10.1186/1471-2105-14-104. View

3.
Hirschman L, Park J, Tsujii J, Wong L, Wu C . Accomplishments and challenges in literature data mining for biology. Bioinformatics. 2002; 18(12):1553-61. DOI: 10.1093/bioinformatics/18.12.1553. View

4.
Hornbeck P, Kornhauser J, Tkachev S, Zhang B, Skrzypek E, Murray B . PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 2011; 40(Database issue):D261-70. PMC: 3245126. DOI: 10.1093/nar/gkr1122. View

5.
Rzhetsky A, Seringhaus M, Gerstein M . Seeking a new biology through text mining. Cell. 2008; 134(1):9-13. PMC: 2735884. DOI: 10.1016/j.cell.2008.06.029. View