1.02k likes | 1.16k Views
Fast Inference and Learning in Large-State-Space HMMs. Sajid M. Siddiqi Andrew W. Moore The Auton Lab Carnegie Mellon University. Fast Inference and Learning in Large-State-Space HMMs. Sajid M. Siddiqi Andrew W. Moore The Auton Lab Carnegie Mellon University. Sajid Siddiqi: Happy.
E N D
Fast Inference and Learning in Large-State-Space HMMs Sajid M. Siddiqi Andrew W. Moore The Auton Lab Carnegie Mellon University Siddiqi and Moore, www.autonlab.org
Fast Inference and Learning in Large-State-Space HMMs Sajid M. Siddiqi Andrew W. Moore The Auton Lab Carnegie Mellon University Siddiqi and Moore, www.autonlab.org
Sajid Siddiqi: Happy Sajid Siddiqi: Discontented Siddiqi and Moore, www.autonlab.org
q0 q1 q2 q3 q4 Hidden Markov Models 1/3 1
q0 q1 q2 q3 q4 Notation: Hidden Markov Models 1/3 Each of these probability tables is identical 1
q0 q1 q2 q3 q4 Observation Model O0 O1 O2 O3 O4
q0 q1 q2 q3 q4 Observation Model Notation: O0 O1 O2 O3 O4
Some Famous HMM Tasks Question 1: State Estimation What is P(qT=Si| O1O2…OT)
Some Famous HMM Tasks Question 1: State Estimation What is P(qT=Si| O1O2…OT)
Some Famous HMM Tasks Question 1: State Estimation What is P(qT=Si| O1O2…OT)
Some Famous HMM Tasks Question 1: State Estimation What is P(qT=Si| O1O2…OT) Question 2: Most Probable Path Given O1O2…OT , what is the most probable path that I took?
Some Famous HMM Tasks Question 1: State Estimation What is P(qT=Si| O1O2…OT) Question 2: Most Probable Path Given O1O2…OT , what is the most probable path that I took?
Some Famous HMM Tasks Question 1: State Estimation What is P(qT=Si| O1O2…OT) Question 2: Most Probable Path Given O1O2…OT , what is the most probable path that I took? Woke up at 8.35, Got on Bus at 9.46, Sat in lecture 10.05-11.22…
Some Famous HMM Tasks Question 1: State Estimation What is P(qT=Si| O1O2…OT) Question 2: Most Probable Path Given O1O2…OT , what is the most probable path that I took? Question 3: Learning HMMs: Given O1O2…OT , what is the maximum likelihood HMM that could have produced this string of observations?
Some Famous HMM Tasks Question 1: State Estimation What is P(qT=Si| O1O2…OT) Question 2: Most Probable Path Given O1O2…OT , what is the most probable path that I took? Question 3: Learning HMMs: Given O1O2…OT , what is the maximum likelihood HMM that could have produced this string of observations?
Ot Some Famous HMM Tasks aBB bB(Ot) Bus Question 1: State Estimation What is P(qT=Si| O1O2…OT) Question 2: Most Probable Path Given O1O2…OT , what is the most probable path that I took? Question 3: Learning HMMs: Given O1O2…OT , what is the maximum likelihood HMM that could have produced this string of observations? aAB aCB Ot-1 Ot+1 aBA aBC bA(Ot-1) bC(Ot+1) Eat walk aAA aCC
Basic Operations in HMMs For an observation sequence O = O1…OT, the three basic HMM operations are: T = # timesteps, N = # states
Basic Operations in HMMs For an observation sequence O = O1…OT, the three basic HMM operations are: This talk: A simple approach to reducing the complexity in N T = # timesteps, N = # states
Reducing Quadratic N penalty Why does it matter? • Quadratic HMM algorithms hinder HMM computations when N is large • Several promising applications for efficient large-state-space HMM algorithms in • biological sequence analysis • speech recognition • real-time HMM systems such as for activity monitoring
Idea One: Sparse Transition Matrix • Only K << N non-zero next-state probabilities
Idea One: Sparse Transition Matrix • Only K << N non-zero next-state probabilities
Idea One: Sparse Transition Matrix Only O(TNK) • Only K << N non-zero next-state probabilities
Idea One: Sparse Transition Matrix Only O(TNK) • Only K << N non-zero next-state probabilities • But can get very badly confused by “impossible transitions” • Cannot learn the sparse structure (once chosen cannot change)
Dense-Mostly-Constant Transitions • K non-constant probabilities per row • DMC HMMs comprise a richer and more expressive class of models than sparse HMMs a DMC transition matrix with K=2
Dense-Mostly-Constant Transitions • The transition model for state i now comprises: • NCi= { j : sisj is a non-constant transition probability } • ci = the transition probability for sito all states not in NCi • aij= the non-constant transition probability for si sj, NC3 = {2,5} c3 = 0.05 a32 = 0.25 a35 = 0.6
HMM Filtering P(qt = si | O1, O2 … Ot)
HMM Filtering P(qt = si | O1, O2 … Ot) = Where
HMM Filtering P(qt = si | O1, O2 … Ot) = Where
HMM Filtering P(qt = si | O1, O2 … Ot) = Where
HMM Filtering P(qt = si | O1, O2 … Ot) = Where
HMM Filtering P(qt = si | O1, O2 … Ot) = Where • Cost O(TN2)
Fast Evaluation in DMC HMMs O(N), but common to all j per timestep t O(K) for each t(j) • This yields O(TNK) complexity for the evaluation problem.
O(N), but common to all j per timestep t O(K) for each t(j) Fast Inference in DMC HMMs The Viterbi algorithm uses dynamic programming to calculate the globally optimal state sequence Qg=maxQP(Q,O|). Define t(i) as The variables can be computed in O(TN2) time, with the O(N) inductive step: Under the DMC assumption, this step can be carried out in O(K) time:
Learning a DMC HMM • Idea One: • Ask user to tell us the DMC structure • Learn the parameters using EM
Learning a DMC HMM • Idea One: • Ask user to tell us the DMC structure • Learn the parameters using EM • Simple • But in general, don’t know the DMC structure
Learning a DMC HMM • Idea Two: Use EM to learn the DMC structure too • Guess DMC structure • Find expected transition counts and observation parameters, given current model and observations • Find maximum likelihood DMC model given counts • Goto 2
Learning a DMC HMM • Idea Two: Use EM to learn the DMC structure too • Guess DMC structure • Find expected transition counts and observation parameters, given current model and observations • Find maximum likelihood DMC model given counts • Goto 2 DMC structure can (and does) change!
Learning a DMC HMM • Idea Two: Use EM to learn the DMC structure too • Guess DMC structure • Find expected transition counts and observation parameters, given current model and observations • Find maximum likelihood DMC model given counts • Goto 2 In fact, just start with an all-constant transition model DMC structure can (and does) change!
Learning a DMC HMM • Find expected transition counts and observation parameters, given current model and observations
We want new estimate of
We want new estimate of
We want new estimate of
We want new estimate of where
We want where
We want where a b T T N N
We want where Can get this in O(TN) time Can get this in O(TN) time a b T T N N
We want where Can get this in O(TN) time Can get this in O(TN) time a r T T N N
We want where r a T T N N