» Articles » PMID: 21062765

A Hybrid Approach to Extract Protein-protein Interactions

Overview
Journal Bioinformatics
Specialty Biology
Date 2010 Nov 11
PMID 21062765
Citations 29
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Protein-protein interactions (PPIs) play an important role in understanding biological processes. Although recent research in text mining has achieved a significant progress in automatic PPI extraction from literature, performance of existing systems still needs to be improved.

Results: In this study, we propose a novel algorithm for extracting PPIs from literature which consists of two phases. First, we automatically categorize the data into subsets based on its semantic properties and extract candidate PPI pairs from these subsets. Second, we apply support vector machines (SVMs) to classify candidate PPI pairs using features specific for each subset. We obtain promising results on five benchmark datasets: AIMed, BioInfer, HPRD50, IEPA and LLL with F-scores ranging from 60% to 84%, which are comparable with the state-of-the-art PPI extraction systems. Furthermore, our system achieves the best performance on cross-corpora evaluation and comparative performance in terms of computational efficiency.

Availability: The source code and scripts used in this article are available for academic use at http://staff.science.uva.nl/~bui/PPIs.zip

Contact: bqchinh@gmail.com.

Citing Articles

Evaluating GPT and BERT models for protein-protein interaction identification in biomedical text.

Rehana H, Cam N, Basmaci M, Zheng J, Jemiyo C, He Y Bioinform Adv. 2024; 4(1):vbae133.

PMID: 39319026 PMC: 11419952. DOI: 10.1093/bioadv/vbae133.


Protein-Protein Interaction Network Extraction Using Text Mining Methods Adds Insight into Autism Spectrum Disorder.

Nezamuldeen L, Jafri M Biology (Basel). 2023; 12(10).

PMID: 37887054 PMC: 10604135. DOI: 10.3390/biology12101344.


Text Mining and Machine Learning Protocol for Extracting Human-Related Protein Phosphorylation Information from PubMed.

Arumugam K, Shanker R Methods Mol Biol. 2022; 2496:159-177.

PMID: 35713864 DOI: 10.1007/978-1-0716-2305-3_9.


Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes.

Pourreza Shahri M, Kahanda I BMC Bioinformatics. 2021; 22(1):500.

PMID: 34656098 PMC: 8520253. DOI: 10.1186/s12859-021-04421-z.


Triage of documents containing protein interactions affected by mutations using an NLP based machine learning approach.

Qu J, Steppi A, Zhong D, Hao J, Wang J, Lung P BMC Genomics. 2020; 21(1):773.

PMID: 33167858 PMC: 7654050. DOI: 10.1186/s12864-020-07185-7.