Deep Semi-supervised Learning Ensemble Framework for Classifying Co-mentions of Human Proteins and Phenotypes

Overview

Journal BMC Bioinformatics

Publisher Biomed Central

Specialty Biology

Date 2021 Oct 17

PMID 34656098

Citations 1

Authors

Morteza Pourreza Shahri

Indika Kahanda

Affiliations

Soon will be listed here.

Abstract

Background: Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for training such models is highly resource-consuming, extracting millions of unlabeled co-mentions is straightforward.

Results: In this study, we propose a novel deep semi-supervised ensemble framework that combines deep neural networks, semi-supervised, and ensemble learning for classifying human protein-phenotype co-mentions with the help of unlabeled data. This framework allows the ability to incorporate an extensive collection of unlabeled sentence-level co-mentions of human proteins and phenotypes with a small labeled dataset to enhance overall performance. We develop PPPredSS, a prototype of our proposed semi-supervised framework that combines sophisticated language models, convolutional networks, and recurrent networks. Our experimental results demonstrate that the proposed approach provides a new state-of-the-art performance in classifying human protein-phenotype co-mentions by outperforming other supervised and semi-supervised counterparts. Furthermore, we highlight the utility of PPPredSS in powering a curation assistant system through case studies involving a group of biologists.

Conclusions: This article presents a novel approach for human protein-phenotype co-mention classification based on deep, semi-supervised, and ensemble learning. The insights and findings from this work have implications for biomedical researchers, biocurators, and the text mining community working on biomedical relationship extraction.

Citing Articles

SSLpheno: a self-supervised learning approach for gene-phenotype association prediction using protein-protein interactions and gene ontology data.

Bi X, Liang W, Zhao Q, Wang J Bioinformatics. 2023; 39(11).

PMID: 37941450 PMC: 10666204. DOI: 10.1093/bioinformatics/btad662.

References

Lamurias A, Clarke L, Couto F . Extracting microRNA-gene relations from biomedical literature using distant supervision. PLoS One. 2017; 12(3):e0171929. PMC: 5338769. DOI: 10.1371/journal.pone.0171929. View

Ravikumar K, Rastegar-Mojarad M, Liu H . BELMiner: adapting a rule-based relation extraction system to extract biological expression language statements from bio-medical literature evidence sentences. Database (Oxford). 2017; 2017(1). PMC: 5467463. DOI: 10.1093/database/baw156. View

Liu S, Tang B, Chen Q, Wang X . Drug-Drug Interaction Extraction via Convolutional Neural Networks. Comput Math Methods Med. 2016; 2016:6918381. PMC: 4752975. DOI: 10.1155/2016/6918381. View

Kohler S, Carmody L, Vasilevsky N, Jacobsen J, Danis D, Gourdine J . Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res. 2018; 47(D1):D1018-D1027. PMC: 6324074. DOI: 10.1093/nar/gky1105. View

Chen L, Zhang Y, Lu G, Huang T, Cai Y . Analysis of cancer-related lncRNAs using gene ontology and KEGG pathways. Artif Intell Med. 2017; 76:27-36. DOI: 10.1016/j.artmed.2017.02.001. View

Singhal A, Simmons M, Lu Z . Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine. PLoS Comput Biol. 2016; 12(11):e1005017. PMC: 5130168. DOI: 10.1371/journal.pcbi.1005017. View

Singhal A, Simmons M, Lu Z . Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature. J Am Med Inform Assoc. 2016; 23(4):766-72. PMC: 4926749. DOI: 10.1093/jamia/ocw041. View

Liu S, Shen F, Elayavilli R, Wang Y, Rastegar-Mojarad M, Chaudhary V . Extracting chemical-protein relations using attention-based neural networks. Database (Oxford). 2018; 2018. PMC: 6174551. DOI: 10.1093/database/bay102. View

Chen L, Zhang Y, Zhang Z, Huang T, Cai Y . Inferring Novel Tumor Suppressor Genes with a Protein-Protein Interaction Network and Network Diffusion Algorithms. Mol Ther Methods Clin Dev. 2018; 10:57-67. PMC: 6068090. DOI: 10.1016/j.omtm.2018.06.007. View

10.

Gao J, Liu L, Yao S, Huang X, Mamitsuka H, Zhu S . HPOAnnotator: improving large-scale prediction of HPO annotations by low-rank approximation with HPO semantic similarities and multiple PPI networks. BMC Med Genomics. 2019; 12(Suppl 10):187. PMC: 6927106. DOI: 10.1186/s12920-019-0625-1. View

11.

Firth H, Richards S, Bevan A, Clayton S, Corpas M, Rajan D . DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am J Hum Genet. 2009; 84(4):524-33. PMC: 2667985. DOI: 10.1016/j.ajhg.2009.03.010. View

12.

Ng , Wong . Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts. Genome Inform Ser Workshop Genome Inform. 2000; 10:104-112. View

13.

Lim S, Kang J . Chemical-gene relation extraction using recursive neural network. Database (Oxford). 2018; 2018. PMC: 6014134. DOI: 10.1093/database/bay060. View

14.

Kohler S, Doelken S, Mungall C, Bauer S, Firth H, Bailleul-Forestier I . The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2013; 42(Database issue):D966-74. PMC: 3965098. DOI: 10.1093/nar/gkt1026. View

15.

Harrison P, Wright A, Mank J . The evolution of gene expression and the transcriptome-phenotype relationship. Semin Cell Dev Biol. 2012; 23(2):222-9. PMC: 3378502. DOI: 10.1016/j.semcdb.2011.12.004. View

16.

Chen E, Hripcsak G, Xu H, Markatou M, Friedman C . Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. J Am Med Inform Assoc. 2007; 15(1):87-98. PMC: 2274872. DOI: 10.1197/jamia.M2401. View

17.

Mahmood A, Wu T, Mazumder R, Vijay-Shanker K . DiMeX: A Text Mining System for Mutation-Disease Association Extraction. PLoS One. 2016; 11(4):e0152725. PMC: 4830514. DOI: 10.1371/journal.pone.0152725. View

18.

Sahu S, Anand A . Drug-drug interaction extraction from biomedical texts using long short-term memory network. J Biomed Inform. 2018; 86:15-24. DOI: 10.1016/j.jbi.2018.08.005. View

19.

Zhang Y, Lin H, Yang Z, Wang J, Zhang S, Sun Y . A hybrid model based on neural networks for biomedical relation extraction. J Biomed Inform. 2018; 81:83-92. DOI: 10.1016/j.jbi.2018.03.011. View

20.

Dogan T . HPO2GO: prediction of human phenotype ontology term associations for proteins using cross ontology annotation co-occurrences. PeerJ. 2018; 6:e5298. PMC: 6076985. DOI: 10.7717/peerj.5298. View