» Articles » PMID: 8902360

Dirichlet Mixtures: a Method for Improved Detection of Weak but Significant Protein Sequence Homology

Overview
Date 1996 Aug 1
PMID 8902360
Citations 103
Authors
Affiliations
Soon will be listed here.
Abstract

We present a method for condensing the information in multiple alignments of proteins into a mixture of Dirichlet densities over amino acid distributions. Dirichlet mixture densities are designed to be combined with observed amino acid frequencies to form estimates of expected amino acid probabilities at each position in a profile, hidden Markov model or other statistical model. These estimates give a statistical model greater generalization capacity, so that remotely related family members can be more reliably recognized by the model. This paper corrects the previously published formula for estimating these expected probabilities, and contains complete derivations of the Dirichlet mixture formulas, methods for optimizing the mixtures to match particular databases, and suggestions for efficient implementation.

Citing Articles

An FPGA-based hardware accelerator supporting sensitive sequence homology filtering with profile hidden Markov models.

Anderson T, Wheeler T BMC Bioinformatics. 2024; 25(1):247.

PMID: 39075359 PMC: 11285124. DOI: 10.1186/s12859-024-05879-3.


WAS IT A MATch I SAW? Approximate palindromes lead to overstated false match rates in benchmarks using reversed sequences.

Glidden-Handgis G, Wheeler T Bioinform Adv. 2024; 4(1):vbae052.

PMID: 38764475 PMC: 11099658. DOI: 10.1093/bioadv/vbae052.


learnMSA: learning and aligning large protein families.

Becker F, Stanke M Gigascience. 2022; 11.

PMID: 36399060 PMC: 9673500. DOI: 10.1093/gigascience/giac104.


Recent Advances in Machine Learning Variant Effect Prediction Tools for Protein Engineering.

Horne J, Shukla D Ind Eng Chem Res. 2022; 61(19):6235-6245.

PMID: 36051311 PMC: 9432854. DOI: 10.1021/acs.iecr.1c04943.


Methylome decoding of RdDM-mediated reprogramming effects in the Arabidopsis MSH1 system.

Kundariya H, Sanchez R, Yang X, Hafner A, Mackenzie S Genome Biol. 2022; 23(1):167.

PMID: 35927734 PMC: 9351182. DOI: 10.1186/s13059-022-02731-w.