» Articles » PMID: 34656098

Deep Semi-supervised Learning Ensemble Framework for Classifying Co-mentions of Human Proteins and Phenotypes

Overview
Publisher Biomed Central
Specialty Biology
Date 2021 Oct 17
PMID 34656098
Citations 1
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for training such models is highly resource-consuming, extracting millions of unlabeled co-mentions is straightforward.

Results: In this study, we propose a novel deep semi-supervised ensemble framework that combines deep neural networks, semi-supervised, and ensemble learning for classifying human protein-phenotype co-mentions with the help of unlabeled data. This framework allows the ability to incorporate an extensive collection of unlabeled sentence-level co-mentions of human proteins and phenotypes with a small labeled dataset to enhance overall performance. We develop PPPredSS, a prototype of our proposed semi-supervised framework that combines sophisticated language models, convolutional networks, and recurrent networks. Our experimental results demonstrate that the proposed approach provides a new state-of-the-art performance in classifying human protein-phenotype co-mentions by outperforming other supervised and semi-supervised counterparts. Furthermore, we highlight the utility of PPPredSS in powering a curation assistant system through case studies involving a group of biologists.

Conclusions: This article presents a novel approach for human protein-phenotype co-mention classification based on deep, semi-supervised, and ensemble learning. The insights and findings from this work have implications for biomedical researchers, biocurators, and the text mining community working on biomedical relationship extraction.

Citing Articles

SSLpheno: a self-supervised learning approach for gene-phenotype association prediction using protein-protein interactions and gene ontology data.

Bi X, Liang W, Zhao Q, Wang J Bioinformatics. 2023; 39(11).

PMID: 37941450 PMC: 10666204. DOI: 10.1093/bioinformatics/btad662.

References
1.
Lamurias A, Clarke L, Couto F . Extracting microRNA-gene relations from biomedical literature using distant supervision. PLoS One. 2017; 12(3):e0171929. PMC: 5338769. DOI: 10.1371/journal.pone.0171929. View

2.
Ravikumar K, Rastegar-Mojarad M, Liu H . BELMiner: adapting a rule-based relation extraction system to extract biological expression language statements from bio-medical literature evidence sentences. Database (Oxford). 2017; 2017(1). PMC: 5467463. DOI: 10.1093/database/baw156. View

3.
Liu S, Tang B, Chen Q, Wang X . Drug-Drug Interaction Extraction via Convolutional Neural Networks. Comput Math Methods Med. 2016; 2016:6918381. PMC: 4752975. DOI: 10.1155/2016/6918381. View

4.
Kohler S, Carmody L, Vasilevsky N, Jacobsen J, Danis D, Gourdine J . Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res. 2018; 47(D1):D1018-D1027. PMC: 6324074. DOI: 10.1093/nar/gky1105. View

5.
Chen L, Zhang Y, Lu G, Huang T, Cai Y . Analysis of cancer-related lncRNAs using gene ontology and KEGG pathways. Artif Intell Med. 2017; 76:27-36. DOI: 10.1016/j.artmed.2017.02.001. View