» Articles » PMID: 12855434

Remote Homology Detection: a Motif Based Approach

Overview
Journal Bioinformatics
Specialty Biology
Date 2003 Jul 12
PMID 12855434
Citations 40
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Remote homology detection is the problem of detecting homology in cases of low sequence similarity. It is a hard computational problem with no approach that works well in all cases.

Results: We present a method for detecting remote homology that is based on the presence of discrete sequence motifs. The motif content of a pair of sequences is used to define a similarity that is used as a kernel for a Support Vector Machine (SVM) classifier. We test the method on two remote homology detection tasks: prediction of a previously unseen SCOP family and prediction of an enzyme class given other enzymes that have a similar function on other substrates. We find that it performs significantly better than an SVM method that uses BLAST or Smith-Waterman similarity scores as features.

Citing Articles

Genomic language model predicts protein co-regulation and function.

Hwang Y, Cornman A, Kellogg E, Ovchinnikov S, Girguis P Nat Commun. 2024; 15(1):2880.

PMID: 38570504 PMC: 10991518. DOI: 10.1038/s41467-024-46947-9.


Machine Learning Methods for Small Data Challenges in Molecular Science.

Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J Chem Rev. 2023; 123(13):8736-8780.

PMID: 37384816 PMC: 10999174. DOI: 10.1021/acs.chemrev.3c00189.


Building blocks and blueprints for bacterial autolysins.

Mitchell S, Verma D, Griswold K, Bailey-Kellogg C PLoS Comput Biol. 2021; 17(4):e1008889.

PMID: 33793553 PMC: 8051824. DOI: 10.1371/journal.pcbi.1008889.


Taxonomic Classification for Living Organisms Using Convolutional Neural Networks.

Khawaldeh S, Pervaiz U, Elsharnoby M, Alchalabi A, Al-Zubi N Genes (Basel). 2017; 8(11).

PMID: 29149087 PMC: 5704239. DOI: 10.3390/genes8110326.


Protein remote homology detection based on bidirectional long short-term memory.

Li S, Chen J, Liu B BMC Bioinformatics. 2017; 18(1):443.

PMID: 29017445 PMC: 5634958. DOI: 10.1186/s12859-017-1842-2.