» Articles » PMID: 37359724

PPIntegrator: Semantic Integrative System for Protein-protein Interaction and Application for Host-pathogen Datasets

Overview
Journal Bioinform Adv
Specialty Biology
Date 2023 Jun 26
PMID 37359724
Authors
Affiliations
Soon will be listed here.
Abstract

Summary: Semantic web standards have shown importance in the last 20 years in promoting data formalization and interlinking between the existing knowledge graphs. In this context, several ontologies and data integration initiatives have emerged in recent years for the biological area, such as the broadly used Gene Ontology that contains metadata to annotate gene function and subcellular location. Another important subject in the biological area is protein-protein interactions (PPIs) which have applications like protein function inference. Current PPI databases have heterogeneous exportation methods that challenge their integration and analysis. Presently, several initiatives of ontologies covering some concepts of the PPI domain are available to promote interoperability across datasets. However, the efforts to stimulate guidelines for automatic semantic data integration and analysis for PPIs in these datasets are limited. Here, we present PPIntegrator, a system that semantically describes data related to protein interactions. We also introduce an enrichment pipeline to generate, predict and validate new potential host-pathogen datasets by transitivity analysis. PPIntegrator contains a data preparation module to organize data from three reference databases and a triplification and data fusion module to describe the provenance information and results. This work provides an overview of the PPIntegrator system applied to integrate and compare host-pathogen PPI datasets from four bacterial species using our proposed transitivity analysis pipeline. We also demonstrated some critical queries to analyze this kind of data and highlight the importance and usage of the semantic data generated by our system.

Availability And Implementation: https://github.com/YasCoMa/ppintegrator, https://github.com/YasCoMa/ppi_validation_process and https://github.com/YasCoMa/predprin.

References
1.
El-Gebali S, Mistry J, Bateman A, Eddy S, Luciani A, Potter S . The Pfam protein families database in 2019. Nucleic Acids Res. 2018; 47(D1):D427-D432. PMC: 6324024. DOI: 10.1093/nar/gky995. View

2.
Lee S, Chan C, Tsai C, Lai J, Wang F, Kao C . Ortholog-based protein-protein interaction prediction and its application to inter-species interactions. BMC Bioinformatics. 2008; 9 Suppl 12:S11. PMC: 2638151. DOI: 10.1186/1471-2105-9-S12-S11. View

3.
Demir E, Cary M, Paley S, Fukuda K, Lemer C, Vastrik I . The BioPAX community standard for pathway data sharing. Nat Biotechnol. 2010; 28(9):935-42. PMC: 3001121. DOI: 10.1038/nbt.1666. View

4.
. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2018; 47(D1):D506-D515. PMC: 6323992. DOI: 10.1093/nar/gky1049. View

5.
Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K . KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2016; 45(D1):D353-D361. PMC: 5210567. DOI: 10.1093/nar/gkw1092. View