» Articles » PMID: 30403770

HITS-PR-HHblits: Protein Remote Homology Detection by Combining PageRank and Hyperlink-Induced Topic Search

Overview
Journal Brief Bioinform
Specialty Biology
Date 2018 Nov 8
PMID 30403770
Citations 21
Authors
Affiliations
Soon will be listed here.
Abstract

As one of the most important fundamental problems in protein sequence analysis, protein remote homology detection is critical for both theoretical research (protein structure and function studies) and real world applications (drug design). Although several computational predictors have been proposed, their detection performance is still limited. In this study, we treat protein remote homology detection as a document retrieval task, where the proteins are considered as documents and its aim is to find the highly related documents with the query documents in a database. A protein similarity network was constructed based on the true labels of proteins in the database, and the query proteins were then connected into the network based on the similarity scores calculated by three ranking methods, including PSI-BLAST, Hmmer and HHblits. The PageRank algorithm and Hyperlink-Induced Topic Search (HITS) algorithm were respectively performed on this network to move the homologous proteins of query proteins to the neighbors of the query proteins in the network. Finally, PageRank and HITS algorithms were combined, and a predictor called HITS-PR-HHblits was proposed to further improve the predictive performance. Tested on the SCOP and SCOPe benchmark datasets, the experimental results showed that the proposed protocols outperformed other state-of-the-art methods. For the convenience of the most experimental scientists, a web server for HITS-PR-HHblits was established at http://bioinformatics.hitsz.edu.cn/HITS-PR-HHblits, by which the users can easily get the results without the need to go through the mathematical details. The HITS-PR-HHblits predictor is a protocol for protein remote homology detection using different sets of programs, which will become a very useful computational tool for proteome analysis.

Citing Articles

BioSeq-Diabolo: Biological sequence similarity analysis using Diabolo.

Li H, Liu B PLoS Comput Biol. 2023; 19(6):e1011214.

PMID: 37339155 PMC: 10313010. DOI: 10.1371/journal.pcbi.1011214.


Identification of potential therapeutic intervening targets by in-silico analysis of nsSNPs in preterm birth-related genes.

Azmi M, Khan W, Azim M, Nisar M, Jehan F PLoS One. 2023; 18(3):e0280305.

PMID: 36881567 PMC: 9990928. DOI: 10.1371/journal.pone.0280305.


A GHKNN model based on the physicochemical property extraction method to identify SNARE proteins.

Gu X, Ding Y, Xiao P, He T Front Genet. 2022; 13:935717.

PMID: 36506312 PMC: 9727185. DOI: 10.3389/fgene.2022.935717.


Collectively encoding protein properties enriches protein language models.

An J, Weng X BMC Bioinformatics. 2022; 23(1):467.

PMID: 36348281 PMC: 9641823. DOI: 10.1186/s12859-022-05031-z.


Computational analysis and prediction of PE_PGRS proteins using machine learning.

Li F, Guo X, Xiang D, Pitt M, Bainomugisa A, Coin L Comput Struct Biotechnol J. 2022; 20:662-674.

PMID: 35140886 PMC: 8804200. DOI: 10.1016/j.csbj.2022.01.019.