200 likes | 349 Views
Hierarchical Markov Network. ACKNOWLEDGMENT: Daniel Rubin, Dima Vainbrand, Ronny Ronen, Ohad Falik, Zev Rivlin, Mike Deisher, Shai Fine, Shie Mannor. ICRI-CI Retreat 2013 May 9, 2013 Boris Ginzburg. Summary.
E N D
Hierarchical Markov Network ACKNOWLEDGMENT: Daniel Rubin, Dima Vainbrand, Ronny Ronen, Ohad Falik, Zev Rivlin, Mike Deisher, Shai Fine, Shie Mannor ICRI-CI Retreat 2013 May 9, 2013 Boris Ginzburg ICRI - Computational Intelligence
Summary • Hierarchical Hidden Markov Model (H-HMM):A known statistical model for complex temporal pattern recognition (Fine, Singer, Tishby-1998) • Hierarchical Markov Network (HMN): • Compact and computationally efficient extension of H-HMM based on merging of identical sub-models • A new efficient Viterbi algorithm: sharing of computations for sub-model by its “parents” ICRI - Computational Intelligence
background Hidden Markov Model • Hidden Markov Model (HMM) is among the leading tools used for temporal pattern recognition. • Used in: • Speech recognition • Handwriting • Language processing • Gesture recognition • Bioinformatics • Machine translation • Speech synthesis ICRI - Computational Intelligence
background Hidden Markov Model The state of model is hidden from observer. 20% 30% 3 3 2 5 1 80% 4 1 30% 2 4 40% 2.7 3.4 1.2 4.9 ICRI - Computational Intelligence HMM is stochastic FSM described by Markov model: α(i,j)≔ Prob( q(t+1)=j | q(t)=i) Initial probability: π(i)≔Prob (q(1) = i); State [i] can emit symbol [o] with β(i,o):= Prob ( o | q(t) = i);
background HMM: Viterbi Algorithm Problem:Given the observation sequence: O=( o(1),…o(T) ), what is the most probable state sequence: Q=(q(1),…q(T))? Solution: Forward-Backward (Baum-Welch) algorithm. For each state [x] and time t: Given that system was in [x]at time t, let’s define δ(x,t) - the likelihood of the most probable state sequence,whichcouldgenerate observation ( o(1),…,o(t) ): δ(x,t)≔ max P(q(1),…,q(t-1) | q(t)=x, o(1),…o(t)). We willuselog-likelihood:S(x, t):= -log (δ(x,t)). The S(x,t) iscalledtoken at state [x] at moment t ICRI - Computational Intelligence
background HMM: Viterbi Algorithm Initialization (t=1): S(x,1) = p(x)+ b(x,o(1)); ψ(x,1) = 0; Induction (t t+1): S(y,t+1) = min[ S(x,t) + a(x,y) + b(y,o(t+1)) ]; ψ(y,t+1) = argmin[ S(y,t+1) ]; Termination (t=T): Smin = min[ S(x,T) ]; q(T) = argmin[ S(x,T ) ]; Backward procedure, path recovery (T1): q(t) = ψ(q(t+1),t+1); Forward-backward algorithm (Baum-Welch) is based on principle of dynamic programming (Viterbi) ICRI - Computational Intelligence
background Hierarchy of HMMs Example. Speech recognition - multi-layer hierarchy of HMMs ICRI - Computational Intelligence
background Hierarchical HMM • H-HMM replaces the complex hierarchy of simple HMMs, with one unified model [Fine, Singer, Tishby-1998]. • H-HMM - hierarchical FSM with two types of states: • “Complex” state - state which is itself HMM • “Production” state - simple state on lowest level of hierarchy, which produces observed symbol • Efficient Viterbi algorithm on HHMM with complexity O(T*N2 ), where • T – sequence duration • N – number of states in H-HMM • [K. Murphy & Paskin-2001] and [Wakabayashi & Miura - 2012). ICRI - Computational Intelligence
problem Scalability Problem PROBLEM: structural & computational redundancy, both for “Hierarchy of HMMs” and for H-HMM • Example: dictionary ={speech, beach, peach}. • Use 3-state HMM for phoneme model • 10 instances of HMMs • only 5 different HMM templates ICRI - Computational Intelligence
solution Hidden Markov Network HMN is based on “call-return” semantics: parent node calls sub-HMM, which computes the score of subsequence and returns result to parent node. Hierarchical Markov Network: Compact representation of H-HHM, where each sub-model is embedded once and serves multiple “parents” ICRI - Computational Intelligence
solution HMN: Viterbi algorithm The key observation: Viterbi computations inside identical HMMs, are almost the same. Consider e.g. H-HMM for “beach” and “peach”: ICRI - Computational Intelligence
solution HMN: Viterbi algorithm Token S(.) in sub-HMM “i“ in the word ‘beach”, is based on score of previous phoneme “b”: S(beach.i.x1,t) = [S(beach.b1, t-1) + a(beach.b1, beach.i0)]+ [a(beach.i0, beach.i.x1) + b(beach.i.x1,o(t)) ]; For “peach” S(.) will be based on the score from “p”: S(peach.i.x1,t) = [S(peach.p1, t-1) + a(peach.p1, peach.i0)] + [a(peach.i0, peach.i.x1) + b(peach.i.x1,o(t)) ]; Two last terms in both expressions are equal: a(beach.i0, beach.i.x1) + b(beach.i.x1,o(t)) = a(peach.i0, peach.i.x1) + b(peach.i.x1,o(t)) We can do the computation once, and use it for both words. ICRI - Computational Intelligence
solution HMN: Viterbi algorithm One sub-HMM can serve multiple parents: computes the score of sub-sequence, and returns it to parents ICRI - Computational Intelligence
solution HMN: call-return • Child HMM: • serves multiple calls from multiple nodes. Child maintains list of received calls. • all calls, received at the same moment, are merged and computed together; child keeps list of “return address” • multiple tokens can be generated by one call (all marked by time when call was started) • when token reaches ‘end” state, the score is sent to parent • Parent node: • Maintains list of open calls and prefix scores • Add prefix-score to the score received from child ICRI - Computational Intelligence
solution HMN: call-return ICRI - Computational Intelligence
solution HMN: Temporal hierarchy Inspired by Tali Tishby’ talk yesterday How to support multiple temporal scales on different level of hierarchy? Possible directions: • Exponential increase time scale by each level of hierarchy ∆d = 2* ∆d+1 • One call cover a number of time overlapping sub-sequences, child selects sequence with best score S(x, td) = min (S(x,td+1),S(x,t d+1+1) ICRI - Computational Intelligence
solution HMN: Temporal hierarchy ICRI - Computational Intelligence
solution HMN: performance • HMN has potential performance benefits over HMM/H-HMM if cost of HMM > cost of “call-return”, • Cost of HMM is ~ number of arcs. • Cost of call / return is fixed, depends only on number of return tokens / call; does not depend on size of HMM. Back-of-the envelope: • Cost of Viterbi on 5-state HMM ~ 10 MACs; • Cost of one return token–1 MAC; Additional HMN cost - increased complexity: • book-keeping in HMM, complex parent node structure… ICRI - Computational Intelligence
Next steps Next Steps • A promising idea, need to establish its efficiency beyond plain “back-of-the-envelop”; prototype to check performance claims • Extend HMN theory to support multiple temporal scales on different levels of hierarchy • Connection between Convolutional Neural Network and HMN ICRI - Computational Intelligence
BACKUP ICRI - Computational Intelligence