» Articles » PMID: 16403797

Bayesian Classifiers for Detecting HGT Using Fixed and Variable Order Markov Models of Genomic Signatures

Overview
Journal Bioinformatics
Specialty Biology
Date 2006 Jan 13
PMID 16403797
Citations 15
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Analyses of genomic signatures are gaining attention as they allow studies of species-specific relationships without involving alignments of homologous sequences. A naïve Bayesian classifier was built to discriminate between different bacterial compositions of short oligomers, also known as DNA words. The classifier has proven successful in identifying foreign genes in Neisseria meningitis. In this study we extend the classifier approach using either a fixed higher order Markov model (Mk) or a variable length Markov model (VLMk).

Results: We propose a simple algorithm to lock a variable length Markov model to a certain number of parameters and show that the use of Markov models greatly increases the flexibility and accuracy in prediction to that of a naïve model. We also test the integrity of classifiers in terms of false-negatives and give estimates of the minimal sizes of training data. We end the report by proposing a method to reject a false hypothesis of horizontal gene transfer.

Availability: Software and Supplementary information available at www.cs.chalmers.se/~dalevi/genetic_sign_classifiers/.

Citing Articles

Evolution shapes and conserves genomic signatures in viruses.

Holmudden M, Gustafsson J, Bertrand Y, Schliep A, Norberg P Commun Biol. 2024; 7(1):1412.

PMID: 39478059 PMC: 11526014. DOI: 10.1038/s42003-024-07098-1.


Fast parallel construction of variable-length Markov chains.

Gustafsson J, Norberg P, Qvick-Wester J, Schliep A BMC Bioinformatics. 2021; 22(1):487.

PMID: 34627154 PMC: 8501649. DOI: 10.1186/s12859-021-04387-y.


Comparison of metatranscriptomic samples based on k-tuple frequencies.

Wang Y, Liu L, Chen L, Chen T, Sun F PLoS One. 2014; 9(1):e84348.

PMID: 24392128 PMC: 3879298. DOI: 10.1371/journal.pone.0084348.


Comparison of metagenomic samples using sequence signatures.

Jiang B, Song K, Ren J, Deng M, Sun F, Zhang X BMC Genomics. 2012; 13:730.

PMID: 23268604 PMC: 3549735. DOI: 10.1186/1471-2164-13-730.


Normal and compound poisson approximations for pattern occurrences in NGS reads.

Zhai Z, Reinert G, Song K, Waterman M, Luan Y, Sun F J Comput Biol. 2012; 19(6):839-54.

PMID: 22697250 PMC: 3375642. DOI: 10.1089/cmb.2012.0029.