» Articles » PMID: 19337306

An Agglomerative Hierarchical Approach to Visualization in Bayesian Clustering Problems

Overview
Specialty Genetics
Date 2009 Apr 2
PMID 19337306
Citations 1
Authors
Affiliations
Soon will be listed here.
Abstract

Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals--the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. As the number of possible partitions grows very rapidly with the sample size, we cannot visualize this probability distribution in its entirety, unless the sample is very small. As a solution to this visualization problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package PartitionView. The exact linkage algorithm takes the posterior co-assignment probabilities as input and yields as output a rooted binary tree, or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities.

Citing Articles

A spatial dirichlet process mixture model for clustering population genetics data.

Reich B, Bondell H Biometrics. 2010; 67(2):381-90.

PMID: 20825394 PMC: 3043140. DOI: 10.1111/j.1541-0420.2010.01484.x.

References
1.
Thomas S, Hill W . Estimating quantitative genetic parameters using sibships reconstructed from marker data. Genetics. 2000; 155(4):1961-72. PMC: 1461185. DOI: 10.1093/genetics/155.4.1961. View

2.
Coulon A, Fitzpatrick J, Bowman R, Stith B, Makarewich C, Stenzler L . Congruent population structure inferred from dispersal behaviour and intensive genetic surveys of the threatened Florida scrub-jay (Aphelocoma coerulescens). Mol Ecol. 2008; 17(7):1685-701. DOI: 10.1111/j.1365-294X.2008.03705.x. View

3.
Corander J, Waldmann P, Sillanpaa M . Bayesian analysis of genetic differentiation between populations. Genetics. 2003; 163(1):367-74. PMC: 1462429. DOI: 10.1093/genetics/163.1.367. View

4.
Corander J, Waldmann P, Marttinen P, Sillanpaa M . BAPS 2: enhanced possibilities for the analysis of genetic population structure. Bioinformatics. 2004; 20(15):2363-9. DOI: 10.1093/bioinformatics/bth250. View

5.
Huelsenbeck J, Andolfatto P . Inference of population structure under a Dirichlet process model. Genetics. 2007; 175(4):1787-802. PMC: 1855109. DOI: 10.1534/genetics.106.061317. View