» Articles » PMID: 19616641

Feature Generation and Representations for Protein-protein Interaction Classification

Overview
Journal J Biomed Inform
Publisher Elsevier
Date 2009 Jul 21
PMID 19616641
Citations 8
Authors
Affiliations
Soon will be listed here.
Abstract

Automatic detecting protein-protein interaction (PPI) relevant articles is a crucial step for large-scale biological database curation. The previous work adopted POS tagging, shallow parsing and sentence splitting techniques, but they achieved worse performance than the simple bag-of-words representation. In this paper, we generated and investigated multiple types of feature representations in order to further improve the performance of PPI text classification task. Besides the traditional domain-independent bag-of-words approach and the term weighting methods, we also explored other domain-dependent features, i.e. protein-protein interaction trigger keywords, protein named entities and the advanced ways of incorporating Natural Language Processing (NLP) output. The integration of these multiple features has been evaluated on the BioCreAtIvE II corpus. The experimental results showed that both the advanced way of using NLP output and the integration of bag-of-words and NLP output improved the performance of text classification. Specifically, in comparison with the best performance achieved in the BioCreAtIvE II IAS, the feature-level and classifier-level integration of multiple features improved the performance of classification 2.71% and 3.95%, respectively.

Citing Articles

Improvements in viral gene annotation using large language models and soft alignments.

Harrigan W, Ferrell B, Wommack K, Polson S, Schreiber Z, Belcaid M BMC Bioinformatics. 2024; 25(1):165.

PMID: 38664627 PMC: 11046836. DOI: 10.1186/s12859-024-05779-6.


Document triage for identifying protein-protein interactions affected by mutations: a neural network ensemble approach.

Luo L, Yang Z, Lin H, Wang J Database (Oxford). 2018; 2018.

PMID: 30295718 PMC: 6147215. DOI: 10.1093/database/bay097.


Exploiting graph kernels for high performance biomedical relation extraction.

Panyam N, Verspoor K, Cohn T, Ramamohanarao K J Biomed Semantics. 2018; 9(1):7.

PMID: 29382397 PMC: 5791373. DOI: 10.1186/s13326-017-0168-3.


Protein-Protein Interaction Article Classification Using a Convolutional Recurrent Neural Network with Pre-trained Word Embeddings.

Matos S, Antunes R J Integr Bioinform. 2017; 14(4).

PMID: 29236678 PMC: 6042813. DOI: 10.1515/jib-2017-0055.


Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features.

Thuy Phan T, Ohkawa T BMC Bioinformatics. 2016; 17 Suppl 7:246.

PMID: 27454611 PMC: 4965725. DOI: 10.1186/s12859-016-1100-z.