1 / 21

Pattern Recognition and Machine Learning-Chapter 13: Sequential Data

Pattern Recognition and Machine Learning-Chapter 13: Sequential Data. Affiliation: Kyoto University Name: Kevin Chien , Dr. Oba Shigeyuki, Dr. Ishii Shin Date: Dec. 9, 2011. Origin of Markov Models. Idea. Why Markov Models.

lane-levy
Download Presentation

Pattern Recognition and Machine Learning-Chapter 13: Sequential Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pattern Recognition and Machine Learning-Chapter 13: Sequential Data Affiliation: Kyoto University Name: Kevin Chien, Dr. Oba Shigeyuki, Dr. Ishii Shin Date: Dec. 9, 2011

  2. Origin of Markov Models Idea

  3. Why Markov Models • IID data not always possible. Illustrate future data (prediction) dependent on some recent data, using DAGs where inference is done by sum-product algorithm. • State Space (Markov) Model: Latent Variables • Discrete latent: Hidden Markov Model • Gaussian latent: Linear Dynamical Systems • Order of Markov Chain: data dependence • 1st order: Current observation depends only on previous 1 observation

  4. State Space Model • Latent variable Zn forms a Markov chain. Each Zn contributes to its observation Xn. • As order grows #parameter grows, to organize this we use State Space Model • Zn-1 and Zn+1 is now independent given Zn (d-separated)

  5. For understanding Markov Models Terminologies

  6. Terminologies • Markovian Property: stochastic process that probability of a transition is dependent only on present state and not on the manner in which the current state is reached. • Transition diagram for same variable different state

  7. Terminologies (cont.) [Big_O_notation, Wikipedia, Dec. 2011] • F is bounded above and below by g asymptotically • (review)Zn+1 and Zn-1 is d-separated given Zn: means given we block Zn’s outgoing edges there is no path from Zn+1 and Zn-1 =>independent

  8. Formula and motivation Markov Models

  9. Hidden Markov Models (HMM) • Zn discrete multinomial variable • Transition probability matrix • Sum of each row =1 • P(staying in present state) is non-zero • Counting non-diagonals K(K-1) parameters

  10. Hidden Markov Models (cont.) • Emission (transition) probability with parameters governing the distribution • homogeneous model: latent variable share the same parameter A • Sampling data is simply noting the parameter values while following transitions with emission probability.

  11. HMM, Expect. Max. for max. likelihood • Likelihood function: marginalizing over latent variables • Start with initial model parameters for • Evaluate • Defining • Likelihood function • results

  12. HMM: forward-backward algorithm • 2 stage message passing in tree for HMM, to find marginals p(node) efficiently • Here the marginals are • Assume p(xk|zk), p(zk|zk-1),p(z1) known • X=(x1,..,xn), xi:j=(xi,xi+1,..,xj) • Goal compute p(zk|x) • Forward part: compute p(zk, x1:k) for every k=1,..,n • Backward part: compute p(xk+1:n|zk) for every k=1,…,n

  13. HMM: forward-backward algorithm (cont.) • P(zk|x) ∝ p(zk,x) = p(xk+1:n|zk,x1:k) p(zk,x1:k) • Where xk+1:n and x1:kare d-separated given zk • so P(zk|x) ∝ p(zk,x) = p(xk+1:n|zk)p(zk,x1:k) • Now we can do • EM algorithm and Baum-Welch algorithm to estimate parameter values • Sample from posterior z given x. Most likely z with Viterbi algorithm

  14. HMM forward-backward algorithm: Forward part • Compute p(zk,x1:k) • p(zk,x1:k)=∑(all values of zk-1) p(zk,zk-1,x1:k) = ∑(all values of zk-1) p(xk|zk,zk-1,x1:k-1)p(zk|zk-1,x1:k-1)p(zk-1,x1:k-1) mm…look like a recursive function, if p(zk,x1:k) is labeled αk(zk) then • zk-1,x1:k-1 and xkd-separated given zk • zk and xk-1 d-separated given zk-1 So αk(zk)=∑(all values of zk-1) p(xk|zk)p(zk|zk-1) αk-1(zk-1) For k=2,..,n Emission prob. transition prob. recursive part

  15. HMM forward-backward algorithm: Forward part (cont.) • α1(z1)=p(z1,x1)=p(z1)p(x1|z1) • If each z has m states then computational complexity is • Θ(m) for each zk for one k • Θ(m2) for each k • Θ(nm2) in total

  16. HMM forward-backward algorithm: Backward part • Compute p(xk+1:n|zk) for all zk and all k=1,..,n-1 • p(xk+1:n|zk)=∑(all values of zk+1) p(xk+1:n,zk+1|zk) =∑(all values of zk+1) p(xk+2:n|zk+1,zk,xk+1)p(xk+1|zk+1,zk)p(zk+1|zk) mm…look like a recursive function, if p(xk+1:n|zk) is labeled βk(zk) then • zk,xk+1 and xk+2:n d-separated given zk+1 • zk and xk+1 d-separated given zk+1 So βk(zk) =∑(all values of zk+1) βk+1(zk+1) p(xk+1|zk+1)p(zk+1|zk) For k=1,..,n-1 recursive part Emission prob. transition prob.

  17. HMM forward-backward algorithm: Backward part (cont.) • βn(zn) =1 for all zn • If each z has m states then computational complexity is same as forward part • Θ(nm2) in total

  18. HMM: Viterbi algorithm • Max-sum algorithm for HMM, to find most probable sequence of hidden states for a given observation sequence X1:n • Example: transform handwriting images into text • Assume p(xk|zk), p(zk|zk-1),p(z1) known • Goal: compute z*= argmaxz p(z|x) • Given x=x1:n, z=z1:n • Given lemma f(a)≥0 ∀a and g(a,b) ≥0 ∀a,b then • Maxa,b f(a)g(a,b) = maxa[f(a) maxb g(a,b)] • maxz p(z|x) ∝ maxz p(z,x)

  19. HMM: Viterbi algorithm (cont.) • μk(zk)=maxz1:k p(z1:k,x1:k) =maxz1:k p(xk|zk)p(zk|zk-1) …..f(a) part p(z1:k-1,x1:k-1) ....g(a,b) part mm…look like a recursive function, if we can make max to appear in front of p(z1:k-1,x1:k-1). Use lemma - by setting a=zk-1, b=z1:k-2 =maxzk-1[p(xk|zk)p(zk|zk-1) maxz1:k-2 p(z1:k-1,x1:k-1)] =maxzk-1[p(xk|zk) p(zk|zk-1) μk-1(zk-1) ] For k=2,…,n

  20. HMM: Viterbi algorithm (finish up) • μk(zk)=maxzk-1 p(xk|zk) p(zk|zk-1) μk-1(zk-1) μ1(z1)= p(x1,z1)=p(z1)p(x1|z1) • Same method to get maxz μn(zn)=maxz p(x,z) • We can get max value, to get max sequence, compute recursive equation bottom-up while remembering values (μk(zk) looks at all paths of μk-1(zk-1)) For k=2,…,n

  21. Additional Information • Excerpt of equations and diagrams from [Pattern Recognition and Machine Learning, Bishop C.M.] page 605-646 • Excerpt of equations from Mathematicalmonk, Youtube LLC, Google Inc., (ML 14.6 and 14.7) various titles, July 2011

More Related