Exploiting Maximal Dependence Decomposition to Identify Conserved Motifs from a Group of Aligned Signal Sequences

Overview

Journal Bioinformatics

Publisher Oxford University Press

Specialty Biology

Date 2011 May 10

PMID 21551145

Citations 50

Authors

Tzong-Yi Lee

Zong-Qing Lin

Sheng-Jen Hsieh

Neil Arvin Bretana

Cheng-Tsung Lu

Affiliations

Soon will be listed here.

Abstract

Unlabelled: Bioinformatics research often requires conservative analyses of a group of sequences associated with a specific biological function (e.g. transcription factor binding sites, micro RNA target sites or protein post-translational modification sites). Due to the difficulty in exploring conserved motifs on a large-scale sequence data involved with various signals, a new method, MDDLogo, is developed. MDDLogo applies maximal dependence decomposition (MDD) to cluster a group of aligned signal sequences into subgroups containing statistically significant motifs. In order to extract motifs that contain a conserved biochemical property of amino acids in protein sequences, the set of 20 amino acids is further categorized according to their physicochemical properties, e.g. hydrophobicity, charge or molecular size. MDDLogo has been demonstrated to accurately identify the kinase-specific substrate motifs in 1221 human phosphorylation sites associated with seven well-known kinase families from Phospho.ELM. Moreover, in a set of plant phosphorylation data-lacking kinase information, MDDLogo has been applied to help in the investigation of substrate motifs of potential kinases and in the improvement of the identification of plant phosphorylation sites with various substrate specificities. In this study, MDDLogo is comparable with another well-known motif discover tool, Motif-X.

Contact: francis@saturn.yzu.edu.tw

Citing Articles

PredIL13: Stacking a variety of machine and deep learning methods with ESM-2 language model for identifying IL13-inducing peptides.

Kurata H, Harun-Or-Roshid M, Tsukiyama S, Maeda K PLoS One. 2024; 19(8):e0309078.

PMID: 39172871 PMC: 11340954. DOI: 10.1371/journal.pone.0309078.

ENCAP: Computational prediction of tumor T cell antigens with ensemble classifiers and diverse sequence features.

Yu J, Ni K, Chen C PLoS One. 2024; 19(7):e0307176.

PMID: 39024250 PMC: 11257298. DOI: 10.1371/journal.pone.0307176.

Improved prediction of anti-angiogenic peptides based on machine learning models and comprehensive features from peptide sequences.

Lee Y, Yu J, Ni K, Lin Y, Chen C Sci Rep. 2024; 14(1):14387.

PMID: 38909149 PMC: 11193773. DOI: 10.1038/s41598-024-65062-9.

AMPActiPred: A three-stage framework for predicting antibacterial peptides and activity levels with deep forest.

Yao L, Guan J, Xie P, Chung C, Deng J, Huang Y Protein Sci. 2024; 33(6):e5006.

PMID: 38723168 PMC: 11081525. DOI: 10.1002/pro.5006.

StackDPP: a stacking ensemble based DNA-binding protein prediction model.

Ahmed S, Bose D, Khandoker R, Rahman M BMC Bioinformatics. 2024; 25(1):111.

PMID: 38486135 PMC: 10941422. DOI: 10.1186/s12859-024-05714-9.