140 likes | 170 Views
This paper introduces an improved VB algorithm for DP mixture models, comparing stick-breaking vs. symmetric Dirichlet approximations and emphasizing optimal cluster label ordering in VB.
E N D
Collapsed Variational Dirichlet Process Mixture Models Kenichi Kurihara, Max Welling and Yee W. Teh Published on IJCAI 07 Discussion led by Qi An
Outline • Introduction • Four approximations to DP • Variational Bayesian Inference • Optimal cluster label reordering • Experimental results • Discussion
Introduction • DP is suitable for many density estimation and data clustering applications. • Gibbs sampling solution is not efficient enough to scale up to the large scale problems. • Truncated stick-breaking approximation is formulated in the space of explicit, non-exchangeable cluster labels.
Introduction • This paper • propose an improved VB algorithm based on integrating out mixture weights • compare the stick-breaking representation against the finite symmetric Dirichlet approximation • maintain optimal ordering of cluster labels in the stick-breaking VB algorithm
Approximations to DP • Truncated stick-breaking representation The joint distribution can be expressed as:
Approximations to DP • Finite symmetric Dirichlet approximation The joint distribution can be expressed as: The essential difference from TSB representation is that the cluster labels remain interchangeable under this formulation.
Dirichlet process is most naturally defined on a partition space while both TSB and FSD are defined over the cluster label space. Moreover, TSB and FSD also live in different spaces
Marginalization • In variational Bayesian approximation, we assume a factorized form for the posterior distribution. • However it is not a good assumption since changes in π will have a considerable impact on z. If we can integrate out π, the joint distribution is given by For the TSB representation: For the FSD representation: α
VB inference • We can then apply the VB inference on the four approximations The lower bound is given by The approximated posterior distribution for TSB and FSD are Depending on marginalization or not, v and π may be integrated out.
Gaussian approximation • For collapsed approximations, the computation for q(zij) seems intractable due to the exponentially large space of assignments for all other {zij}. With central limit theory and Taylor expansion, the expectation over zij will be approximated with those expectations
Optimal cluster label reordering • For FSB representation, the prior assumes a certain ordering of the clusters. • The authors claims the optimal relabelling of the clusters is given by ordering the cluster sizes in decreasing order.
Discussion • There is very little difference between variational Bayesian inference in the reordered stick-breaking representation and the finite mixture model with symmetric Dirichlet priors. • Label reordering is important for the stick-breaking representation • Variational approximation are much more efficient computationally than Gibbs sampling, with almost no loss in accuracy