250 likes | 430 Views
An HDP-HMM for Systems with State Persistence. Emily B. Fox, Erik B. Sudderth, Michael I. Jordan and Alan S. Willsky 25 th International Conference on Machine Learning Presented by Lu Ren ECE Dept., Duke University July 11, 2008. Outline. 1 limitations of the HDP-HMM formulation
E N D
An HDP-HMM for Systems with State Persistence Emily B. Fox, Erik B. Sudderth, Michael I. Jordan and Alan S. Willsky 25th International Conference on Machine Learning Presented by Lu Ren ECE Dept., Duke University July 11, 2008
Outline • 1 limitations of the HDP-HMM formulation • temporal persistence of states versus unrealistically dynamics • Bayesian bias towards simpler models is insufficient • Sometimes models of varying complexity is averaged effectively • (e.g., prediction with posterior being integrated out ) • However, the problem of speaker diarization brings trouble • to HDP-HMM ( infer the number of speakers as well as the • transitions among speakers)
Outline • 2. Contribution: • formulate general solution to the state persistence in • HDP-HMM with nonparametric Bayesian inference • allow more flexible, nonparametric emission distributions • develop a blocked Gibbs sampler to jointly resample the • state and emission assignments
Background 1. Dirichlet process (DP) 2. hierarchical Dirichlet process (HDP) DP Mixture HDP Mixture
Background 3. Chinese restaurant franchise (CRF) An alternative representation via indicator variables Table assignment: Dish assignment: Observation generation: 4. An alternative weak limit approximation as , finite hierarchical mixture model converges in the distribution to the HDP
Sticky HDP-HMM 1. Problem of HDP-HMM • By sampling , similar transition distributions for all states • allows for state sequences with unrealistically fast dynamics Divide an observation block into two small-variance states with slightly different means
Sticky HDP-HMM 2. Potential issues: • redundant states impede us to explain the observations • slowing mixing rates • (alternating pattern reinforced by the properties of the CRF) • poor predictive performance with redundant states for high-dimension observations 3. Proposed solution: • positive values increase self-transition under prior • When the original HDP-HMM is recovered.
Sticky HDP-HMM 4. Graphical Model representation: CRF with loyal customers
Model Sampling 5. Sampling methods: √ A CRF with Loyal Customers Each restaurant has a specialty dish Children are more likely to eat in the same restaurant as their parent and also eat the specialty dish. Keeps many generations eating in the same restaurant represents the considered dish : override variable represents the served dish
Model Sampling √ Sampling via Direct Assignments Sampling Sampling Require sample (number of tables in restaurant j served dish k, and (overwrite variable)
Model Sampling :simulate from a DP with concentration parameter If ; otherwise Then we have Sampling Hyper-parameters Place gamma prior on and , and Beta prior on √ Blocked Sampling of State Sequences Direct assignment sampler exhibit slow mixing rates since global state sequence changes are forced to occur coordinate by coordinate Two continuous and temporally separated observations of a given state to be grouped into two states
Model Sampling A variant of the HMM forward-backward procedure Require approximation with week limit approximation of DP The posterior distribution of and : • Block sample : • compute backward messages • b. sample
Multimodal Emission Distributions • approximate each emission using an infinite DP mixture of Gaussians • Bias towards self-transitions allow us to distinguish between the underlying HDP-HMM states (identifiable). : indexing the component of the emission density For each state, a unique for mixture weights so that
Multimodal Emission Distributions • blocked resampling of Use limit approximations to both the HDP-HMM and DP emissions The backward message from to is solely a function of :
Experiment Results 1. Synthetic Data • Generated from a three-state Gaussian emission HMM: • 0.97 self-transition probability; means 50, 0, -50; variances 50, 10, 50 • For blocked sampler, truncation level Hamming distance between true and estimated state sequence over 100 iterations, and with 200 initializations for median, 10th and 90th quantiles. Note: the direct assignment sampler’s slower convergence can be attributed to the sampler splitting temporally separated segments of a true state into multiple redundant states.
Experiment Results Sticky HDP-HMM blocked sampler and direct assignment sampler Original HDP-HMM blocked sampler and direct assignment sampler
Experiment Results • Generate data from a two-state HMM with multimodal emission • Each state had a two Gaussian components with equal weights • Mean: (0,10) and (-7,7), variance: 10 • Self-transition probability is 0.98 Observation sequence Estimated state sequence of sticky HMM but with single state emission component Infinite Gaussian mixture components True state sequence
Experiment Results Hamming distance error between true state sequence and the estimated state sequence (blue: median, red: 10th and 90th quantiles). (e): infinite Gaussian emission mixture with sticky HDP-HMM (f): infinite Gaussian emission mixture with HDP-HMM (g): single Gaussian emission component with sticky HDP-HMM (h): single Gaussian emission component with HDP-HMM
Experiment Results 2. Speaker Diarization Data Segment an audio recording into speaker-homogeneous regions Averaged 19 MFCCs computed over 250ms window every 10 ms Minimum speaker duration of 500ms is set For meeting with sticky HDP-HMM True state sequence State sequence estimate
Experiment Results For meeting (red: incorrect label) True state sequence State sequence estimate
Experiment Results DER: Diarization error rate
Experiment Results As a further comparison, the best performance of other methods: Overall DER: 18.37%, best and worst DER: 4.39% and 32.33% Results of sticky HDP-HMM: Overall DER: 19.04%, best and worst DER: 1.26% and 31.42%
Conclusions • Extend HDP-HMM with a separate parameter capturing state persistence • A fully nonparametric treatment of multimodal emissions • Present efficient sampling methods • Results on both synthetic data and a real data set