1 / 17

Hierarchical Dirichelet Processes

Sharing Clusters Among Related Groups:. Hierarchical Dirichelet Processes. Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004. Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05. Overview. Motivation Dirichlet Processes Hierarchical Dirichlet Processes Inference

xarles
Download Presentation

Hierarchical Dirichelet Processes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sharing Clusters Among Related Groups: Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05

  2. Overview • Motivation • Dirichlet Processes • Hierarchical Dirichlet Processes • Inference • Experimental results • Conclusions

  3. Motivation • Multi-task learning: clustering • Goal: • Share clusters among multiple related clustering problems (model-based). • Approach: • Hierarchical; • Nonparametric Bayesian; • DP Mixture Model: learn a generative model over the data, treating the classes as hidden variables;

  4. Dirichlet Processes • Let (,) be a measurable space, G0 be a probability measure on the space, and  be a positive real number. • A Dirichlet process is any distribution of a random probability measure G over (,) such that, for all finite partitions (A1,…,Ar) of , • G ~DP(, G0 ) if G is a random probability measure with distribution given by the Dirichlet process. • Draws G from DP are generally not distinct, discrete, , Өk~G0, βk are random and depend on. • Properties:

  5. Chinese Restaurant Processes • CRP(the polya urn scheme) • Φ1,…,Φi-1, i.i.d., r.v., distributed according to G; Ө1,…, ӨK to be the distinct values taken on by Φ1,…,Φi-1, nk be # of Φi’= Өk, 0<i’<i, This slide is from “Chinese Restaurants and Stick-Breaking: An Introduction to the Dirichlet Process”, NLP Group, Stanford, Feb. 2005

  6. DP Mixture Model • One of the most important application of DP: nonparametric prior distribution on the components of a mixture model. • Why no direct application of density estimation? Because G is discrete?

  7. HDP – Problem statement • We have J groups of data, {Xj}, j=1,…, J. For each group, Xj={xji}, i=1, …, nj. • In each group, Xj={xji} are modeled with a mixture model. The mixing proportions are specific to the group. • Different groups share the same set of mixture components (underlying clusters, ), but different group is a different combination of the mixture components. • Goal: • Discover the distribution of within a group; • Discover the distribution of across groups;

  8. HDP - General representation • G0: the global prob. measure ~ DP(r, H) , r: concentration parameter, H is the base measure. • Gj: the probability distribution for group j, ~ DP(α, G0). • Φji : the hidden parameters of distribution F(Φji) corresponding to xji. • The overall model is: • Two-level DPs.

  9. HDP - General representation • G0 places non-zeros mass only on , thus, • , i.i.d, r.v. distributed according to H.

  10. HDP-CR franchise • First level: within each group, DP mixture • Φj1,…,Φji-1, i.i.d., r.v., distributed according to Gj; Ѱj1,…, ѰjTj to be the values taken on by Φj1,…,Φji-1, njk be # of Φji’= Ѱjt, 0<i’<i. • Second level: across group, sharing components • Base measure of each group is a draw from DP: Ѱjt | G0 ~ G0, G0 ~ DP(r, H), • Ө1,…, ӨK to be the values taken on by Ѱj1,…, ѰjTj , mk be # of Ѱjt=Өk, all j, t.

  11. HDP-CR franchise Integrating out G0 • Values of Φji are shared among groups.

  12. Inference- MCMC • Gibbs sampling the posterior in the CR franchise: • Instead of directly dealing with Φji & Ѱjt to get p(Φ, Ѱ|X), p(t, k, Ө|X) is achieved by sampling t, k, Ө, where, • t={tji}, tji is the table index that Φji associated with, Φji=Ѱjtji. • K={kjt}, kjt is the index that Ѱjttakes value on Өk, Ѱjt=Өkjt. • Knowing the prior distribution as shown in CPR franchise, the posterior is sampled iteratively, • Sampling t: • Sampling K: • Sampling Ө:

  13. Experiments on the synthetic data 1 1 1 2 2 2 7 7 7 6 6 6 3 3 3 4 4 4 5 5 5 Group 3: [5, 6, 1, 7] Group 1: [1, 2, 3, 7] Group 2: [3, 4, 5, 7] • Data description: • We have three group data; • Each group is a Gaussian mixture; • Different group can share same clusters; • Each cluster has 50 2-D data points, features are independent;

  14. Experiments on the synthetic data • HDPs definition: here, F(xji|φji) is Gussian distribution, φji={μji, σji}; φji take values on one of θk={μk, σk}, k=1…. μ ~ N(m, σ/β), σ-1 ~ Gamma (a, b), i. e., H is Norm-Gamma joint distribution. m, β, a, b are given hyperparameters. • Goal: • Model each group as a Gaussian mixture ; • Model the cluster distribution over groups ;

  15. Experiments on the synthetic data • Results on Synthetic Data • Global distribution: Estimated over all groups and the corresponding mixing proportions The number of components is open-ended, here only partial is shown.

  16. Experiments on the synthetic data Mixture within each group: The number of components in each group is also open-ended, here only partial is shown.

  17. Conclusions & discussions • This hierarchical Bayesian method can automatically determine the appropriate number of mixture components needed. • A set of DPs are coupled via their base measure to achieve the component sharing among groups. • Non-parametric priors; not non-parametric density estimation.

More Related