150 likes | 189 Views
Ch 13. Sequential Data (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by Kim Jin-young Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/. Contents. 13.1 Markov Models 13.2 Hidden Markov Models 13.2.1 Maximum likelihood for the HMM
E N D
Ch 13. Sequential Data (1/2)Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by Kim Jin-young Biointelligence Laboratory, Seoul National University http://bi.snu.ac.kr/
Contents • 13.1 Markov Models • 13.2 Hidden Markov Models • 13.2.1 Maximum likelihood for the HMM • 13.2.2 The forward-backward algorithm • 13.2.3 The sum-product algorithm for the HMM • 13.2.4 Scaling factors • 13.2.5 The Viterbi Algorithm • 13.2.6 Extensions of the HMM (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sequential Data • Data dependency exists according to a sequence • Weather data, DNA, characters in sentence • i.i.d. assumption doesn’t hold • Sequential Distribution • Stationary vs. Nonstationary • Markov Model • No latent variable • State Space Models • Hidden Markov Model (discrete latent variables) • Linear Dynamical Systems (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Markov Models • Markov Chain • State Space Model (free of Markov assumption of any order with reasonable no. of extra parameters) (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Hidden Markov Model (overview) • Overview • Introduction of discrete latent vars. (based on prior knowledge) • Examples • Coin toss • Urn and ball • Conditional Random Field • MRF globally conditioned by observation sequence X • CRF relaxes independence assumption by HMM (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Hidden Markov Model (example) • Lattice Representation • Left-to-right HMM <Handwriting Recognition> (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Hidden Markov Model • Given the following, • Joint prob. dist. for HMM is: • Whose elements are: (observation,latent var,model parameters) K : 상태의 수 / N : 총 시간 Zn-1j,nk : 시각 n-1에서 j상태였다가 시각 n에서 k상태로 transition (initial latent node) (cond. dist. among latent vars) (emission prob.) (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
EM Revisited (slide by Ho-sik Seok) • General EM • Maximizing the log likelihood function • Given a joint distribution p(X, Z|Θ) over observed variables X and latent variables Z, governed by parameters Θ • Choose an initial setting for the parameters Θold • E step Evaluate p(Z|X,Θold ) • M step Evaluate Θnew given by Θnew = argmaxΘQ(Θ ,Θold) Q(Θ ,Θold) = ΣZ p(Z|X, Θold)ln p(X, Z| Θ) • It the covariance criterion is not satisfied, then let Θold Θnew (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Estimation of HMM Parameter (using M.L.) • The Likelihood Function • Using EM Algorithm • E-Step (marginalization over latent vars Z) M-Step (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Forward-backward Algorithm (probability of observation) • Probability for a single latent variable • Defining alpha & beta Recursively • Used for evaluating the prob. Of observation (?) (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sum-product Algorithm (probability of observation) • Factor graph representation • Same result as before (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
The Viterbi Algorithm (most likely state sequence) • From max-sum algorithm • Joint dist. by the most probable path (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
References • HMM • A Tutorial On Hidden Markov Models And Selected Applications In Speech Recognition (Rabiner) • CRF Introduction • http://www.inference.phy.cam.ac.uk/hmw26/papers/crf_intro.pdf (C) 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/