Ch 13. Sequential Data (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, 2006.

Ch 13. Sequential Data (1/2)Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by Kim Jin-young Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/

Contents • 13.1 Markov Models • 13.2 Hidden Markov Models • 13.2.1 Maximum likelihood for the HMM • 13.2.2 The forward-backward algorithm • 13.2.3 The sum-product algorithm for the HMM • 13.2.4 Scaling factors • 13.2.5 The Viterbi Algorithm • 13.2.6 Extensions of the HMM (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Sequential Data • Data dependency exists according to a sequence • Weather data, DNA, characters in sentence • i.i.d. assumption doesn’t hold • Sequential Distribution • Stationary vs. Nonstationary • Markov Model • No latent variable • State Space Models • Hidden Markov Model (discrete latent variables) • Linear Dynamical Systems (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Hidden Markov Model (overview) • Overview • Introduction of discrete latent vars. (based on prior knowledge) • Examples • Coin toss • Urn and ball • Conditional Random Field • MRF globally conditioned by observation sequence X • CRF relaxes independence assumption by HMM (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Hidden Markov Model • Given the following, • Joint prob. dist. for HMM is: • Whose elements are: (observation,latent var,model parameters) K : 상태의 수 / N : 총 시간 Zn-1j,nk : 시각 n-1에서 j상태였다가 시각 n에서 k상태로 transition (initial latent node) (cond. dist. among latent vars) (emission prob.) (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

EM Revisited (slide by Ho-sik Seok) • General EM • Maximizing the log likelihood function • Given a joint distribution p(X, Z|Θ) over observed variables X and latent variables Z, governed by parameters Θ • Choose an initial setting for the parameters Θold • E step Evaluate p(Z|X,Θold ) • M step Evaluate Θnew given by Θnew = argmaxΘQ(Θ ,Θold) Q(Θ ,Θold) = ΣZ p(Z|X, Θold)ln p(X, Z| Θ) • It the covariance criterion is not satisfied, then let Θold Θnew (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Forward-backward Algorithm (probability of observation) • Probability for a single latent variable • Defining alpha & beta Recursively • Used for evaluating the prob. Of observation (?) (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

References • HMM • A Tutorial On Hidden Markov Models And Selected Applications In Speech Recognition (Rabiner) • CRF Introduction • http://www.inference.phy.cam.ac.uk/hmw26/papers/crf_intro.pdf (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/

Ch 13. Sequential Data (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, 2006.