180 likes | 550 Views
Infinite Hierarchical Hidden Markov Models. AISTATS 2009. Katherine A. Heller, Yee Whye Teh and Dilan Görür Lu Ren ECE@Duke University Nov 23, 2009. Outline. Hierarchical structure learning for sequential data Hierarchical hidden Markov model (HHMM)
E N D
Infinite Hierarchical Hidden Markov Models AISTATS 2009 Katherine A. Heller, Yee WhyeTeh and DilanGörür Lu Ren ECE@Duke University Nov 23, 2009
Outline • Hierarchical structure learning for sequential data • Hierarchical hidden Markov model (HHMM) • Infinite hierarchical hidden Markov model (IHHMM) • Inference and learning • Experiment results and demonstrations • Related work and extensions
Multi-scale Structure Sequential data generated The sampled “states” used to generate the data • Consider to infer correlated observations over long periods in the observation sequence. • Potential application: language multi-resolution structure learning, video structure discovery, activity detection etc.
Hierarchical HMM (HHMM) Hierarchical Hidden Markov Models (HHMM) Multiscale models of sequences where each level of the model is a separate HMM emitting lower level HMMs in a recursive manner. The generative process of one HHMM example [2]
Hierarchical HMM (HHMM) 2. The entire set of parameters With a fixed model structure, the model is characterized by the following parameters [1] with with with • 3. Representing the HHMM as a DBN [2] • Simply assume all production states are at the bottom and the state of HMM at level and time is represented by . • specifies the complete “path” from the root to the leaf state. • Indicator variable control completion of the HHMM at level and time .
Hierarchical HMM (HHMM) An HHMM represented as a DBM [2]
Infinite Hierarchical HMM (IHHMM) IHHMM: allows the HHMM hierarchy to have a potentially infinite number of levels. • Observation: State: • Also a state transition indicating variable is introduced: • indicate whether there is a completion of the HHMM at level right before time ; • indicate presence of a state transition from to • The conditional probability of is: • There is an opportunity to transition at level only if there was a transition at level .
Infinite Hierarchical HMM (IHHMM) • The property implied by the structure: • The number of transitions at level before a transition at level occurs is geometrically distributed with a mean . • This implies that the expected number of time steps for which a state at level persists in its current value is . • The states at higher levels persist longer. • The first non-transitioning level at time , has the distribution • is geometrically distributed with parameter if all • The IHHMM allows for a potentially infinite number of levels.
Infinite Hierarchical HMM (IHHMM) The generative process for given is similar to the HHMM: For the levels down to , the state is generated according to The emissions matrix: for the levels
Inference and Learning • The IHHMM is performed using Gibbs sampling and a modified forward-backtrack algorithm. • It iterates between the following two steps: • Sampling state values with fixed parameters for each level • Compute forward messages from to : replace with for • Resample and along the backward pass from to :
Inference and Learning • When the top level is reached, a new level above it will be created by setting all states with 1; • If the level below the current top level has no state transitions, it becomes the new top level. 2. Sampling parameters given the current state: • Parameters are initialized as draws from the Dirichlet priors; • Posteriors are calculated based on the counts of state transitions and emissions in the previous step. Predicting new observations given the current state of the IHHMM: 1. Assume the top level learned from the IHHMM is , then calculate the following recursions from to :
Inference and Learning 2. Compute the probability of observing from :
Experiment Results 1. Data generated: sample samplesample Sequential data generated The sampled “states” used to generate the data
Experiment Results 2. Demonstrate the model can capture the hierarchical structure • The first data set consists of repeats of integers increasing from 1 to 7, followed by repetitions of integers decreasing from 5 to 1, repeated twice. • The second data is the first one concatenated with another series of repeated increasing and decreasing sequences of integers. • 7 states is used in the model at all levels. b)
Experiment Results The predictive log probability of the next integer is calculated: HMM: 0.25 IHHMM: 0.31 HHMM: 0.30 (for 2-4 levels) 3. Spectral data from Handel’s Hallelujah chorus
Experiment Results 4. Alice in Wonderland letters data set. The difference in log predictive likelihood between IHHMM and a one level HMM learned by Gibbs sampling The difference in log predictive likelihood between IHHMM and a HMM learned by EM • The mean differences in both plots are positive, demonstrating that the IHHMM gives superior performance on this data. • The long tails signifies that there are letters which can be better predicted with the higher hierarchical levels.
Final discussions Relation to the HHMM: IHHMM is a nonparametric extension of the HHMM for an unbounded hierarchy depth; The completion of an internal HHMM is governed by an independent process. Other related work: Probabilistic context free grammars with multi-scale structure learning; Infinite HMM, infinite factorial HMM; Future work: Make the number of states at each level infinite as well as the infinite HMM; Higher order Markov chains; More efficient inference algorithms.
Cited References [1] S. Fine, Y. Singer, and N. Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning, 32: 41-62, 1998. [2] K. Murphy and M.A. Paskin. Linear time inference in hierarchical HMMs. In Neural Information Processing Systems, 2001.