190 likes | 206 Views
Explore the Hierarchical Dirichlet Process (HDP) & Infinite Hidden Markov Model (iHMM) in clustering problems, sharing clusters, genome pattern analysis, and more. Discover the mathematical formulation, properties, and applications of these innovative models for data analysis.
E N D
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Paper by Y. W. Teh, M. I. Jordan, M. J. Beal & D. M. Blei, NIPS 2004 Duke University Machine Learning Group Presented by Kai Ni February 17, 2006
Outline • Motivation • Dirichlet Processes (DP) • Hierarchical Dirichlet Processes (HDP) • Infinite Hidden Markov Model (iHMM) • Results & Conclusions
Motivation • Problem – “multi-task learning” in which the “tasks” are clustering problems. • Goal – Share clusters among multiple, related clustering problems. The number of clusters are open-ended and inferred automatically by the model. • Application • Genome pattern analysis • Information retrieval of corpus
Hierarchical Model • A single clustering problem can be analyzed as a Dirichlet process (DP). • Draws G from DP are discrete, generally not distinct. • For J groups, we consider Gj for j=1~J is a group-specific DP. • To share information, we link the group-specific DPs • If G(τ) is continuous, the draws Gj have no atoms in common with probability one. • HDP solution: G0 is itself a draw from a DP(, H)
Dirichlet Process & Hierarchical Dirichlet Process • Three different perspectives • Stick-breaking • Chinese restaurant • Infinite mixture models • Setup • Properties of DP
Stick-breaking View • A mathematical explicit form of DP. Draws from DP are discrete. • In DP • In HDP
DP – Chinese Restaurant Process • Exhibit clustering property • Φ1,…,Φi-1, i.i.d., r.v., distributed according to G; Ө1,…, ӨK to be the distinct values taken on by Φ1,…,Φi-1, nk be # of Φi’= Өk, 0<i’<i,
HDP – Chinese Restaurant Franchise • First level: within each group, DP mixture • Φj1,…,Φj(i-1), i.i.d., r.v., distributed according to Gj; Ѱj1,…, ѰjTj to be the values taken on by Φj1,…,Φj(i-1), njk be # of Φji’= Ѱjt, 0<i’<i. • Second level: across group, sharing clusters • Base measure of each group is a draw from DP: • Ө1,…, ӨK to be the values taken on by Ѱj1,…, ѰjTj , mk be # of Ѱjt=Өk, all j, t.
HDP – CRF graph • The values of are shared between groups, as well as within groups. This is a key property of HDP. Integrating out G0
DP Mixture Model • One of the most important application of DP: nonparametric prior distribution on the components of a mixture model. • G can be looked as an infinite mixture model.
HDP mixture model • HDP can be used as the prior distribution over the factors for nested group data. • We consider a two-level DPs. G0 links the child Gj DPs and forces them to share components. Gj is conditionally independent given G0
Infinite Hidden Markov Model • The number of hidden states is allowed to be countably infinite. • The transition probabilities given in the ith row of the transition matrix A can be interpreted as mixing proportions • = (ai1, ai2, …, aik, …) • Thus each row of the A in HMM is a DP. Also these DPs must be linked, because they should have same set of “next states”. HDP provides the natural framework for the infinite HMM.
iHMM via HDP • Assign observations to groups, where the groups are indexed by the value of the previous state variable in the sequence. Then the current state and emission distribution define a group-specific mixture model. • Multiple iHMMs can be linked by adding an additional level of Bayesian hierarchy, letting a master DP couple each of the iHMM, each of which is a set of DPs.
Non-trivialities in iHMM • HDP assumes a fixed partition of the data into groups while HMM is for time-series data, and the definition of groups is itself random. • Consider CRF aspect of HDP, the number of restaurant is infinite. Also in the sampling scheme, changing st may affect all subsequent data assignment. • CRF is natural to describe the iHMM, however it is awkward for sampling. We need to use sampling algorithm from other respects for the iHMM.
Conclusion • HDP is a hierarchical, nonparametric model for clustering problems involving multiple groups of data. • The mixture components are shared across groups and the appropriate number is determined by HDP automatically. • HDP can be extended to infinite HMM model, providing effective inference algorithm.
Reference • Y.W. Teh, M.I. Jordan, M.J. Beal and D.M. Blei, “Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes”, NIPS 2004. • Beal, M.J., Ghahramani, Z. and Rasmussen, C.E., “The Infinite Hidden Markov Model”, NIPS 2002 • Y.W. Teh, M.I. Jordan, M.J. Beal and D.M. Blei, “Hierarchical Dirichlet Processes”, Revised version to appear in JASA, 2006.