320 likes | 491 Views
CS 59000 Statistical Machine learning Lecture 25. Yuan (Alan) Qi Purdue CS Nov. 25 2008. Outline . Review of Hidden Markvo Models, forward-backward algorithm, EM for learning HMM parameters, Viterbi Algorithm, Kalman filtering and smoothing
E N D
CS 59000 Statistical Machine learningLecture 25 Yuan (Alan) Qi Purdue CS Nov. 25 2008
Outline Review of Hidden Markvo Models, forward-backward algorithm, EM for learning HMM parameters, Viterbi Algorithm, Kalman filtering and smoothing Rejection Sampling, Importance Sampling, Metroplis-hasting algorithm, Gibbs sampling
Hidden Markov Models Many applications, e.g., speech recognition, natural language processing, handwriting recognition, bio-sequence analysis
From Mixture Models to HMMs By turning a mixture Model into a dynamic model, we obtain the HMM. Let model the dependence between two consecutive latent variables by a transition probability:
HMMs Prior on initial latent variable: Emission probabilities: Joint distribution:
Samples from HMM (a) Contours of constant probability density for the emission distributions corresponding to each of the three states of the latent variable. (b) A sample of 50 points drawn from the hidden Markov model, with lines connecting the successive observations.
Inference: Forward-backward Algorithm Goal: compute marginals for latent variables. Forward-backward Algorithm: exact inference as special case of sum-product algorithm on the HMM. Factor graph representation (grouping emission density and transition probability in one factor at a time):
Forward-backward Algorithm as Message Passing Method (1) Forward messages:
Forward-backward Algorithm as Message Passing Method (2) Backward messages (Q: how to compute it?): The messages actually involves X Similarly, we can compute the following (Q: why)
Rescaling to Avoid Overflowing When a sequence is long, the forward message will become to small to be represented by the dynamic range of the computer. We redefine the forward message as Similarly, we re-define the backward message as Then, we can compute See detailed derivation in textbook
Viterbi Algorithm Viterbi Algorithm: Finding the most probable sequence of states Special case of sum-product algorithm on HMM. What if we want to find the most probable individual states?
Maximum Likelihood Estimation for HMM Goal: maximize From EM for mixture of Gaussians to EM for HMMs.
EM for HMM E step: Computed from forward-backward/sum-product algorithm M step:
Linear Dynamical Systems Equivalently, we have where
Kalman Filtering and Smoothing Inference on linear Gaussian systems. Kalman filtering: sequentially update scaled forward message: Kalman smoothing: sequentially update state beliefs based on scaled forward and backward messages:
Sampling Methods Goal: compute Challenges: we cannot compute the above equation in analytically. Sampling methods: draw random samples such that
Rejection Sampling (1) Goal: draw samples from a complicated distribution
Limitations of Rejection Sampling When the proposal distribution is very different from the un-normlized target distribution, many samples will be wasted. Difficult for high dimensional problems.
Importance Sampling (1) Importance weights: Discussion: what will be an ideal proposal distribution ?
Importance Sampling (2) When in is unknown, we have where
Sampling and EM M step in EM maximizes What if we cannot even evaluate the above integration? One idea: using sampling method: Known as Monte Carlo EM algorithm.
Imputation Posterior (IP) Algorithm: Change EM for Bayesian Estimation
Markov Chain Monte Carlo Goal: use Markov chains to draw samples from a given distribution. Idea: set up a Markov chain that converges to the target distribution and draw samples from the chain.
Invariant Distribution and Detailed Balance 1-order Markov chain: Transition probabilities: Invariant distribution: A distribution is said to be invariant, or stationary, with respect to a Markov chain if each step in the chain leaves that distribution invariant. Detailed balance: sufficient condition for invariant distributions.
Ergodicity and Equilibrium Distribution. For m→∞, the distribution p(z(m)) converges to the required invariant distribution, i.e., the target distribution, irrespective of the choice of initial distribution (which may be different from the target distribution). This property is called ergodicity and the invariant distribution is called the equilibrium distribution. Under weak restrictions on invariant and transition distributions, a homogeneous Markov chain will be ergodic.
Metropolis-Hastings Algorithm Iterative algorithm. At step with state , we draw a sample from a proposal distribution and accept it with probability Such that the detailed balance requirement is satisfied.
Choice of Proposal Distributions Local Gaussian distributions centered on the current state (if it is symmetric Gaussian, H-M algorithm reduces to Metroplis algorithm): Variance is too small -> high acceptance rate but slow random walks. Variance is too large -> low acceptance rate. Global Gaussian distribution: find Laplace approximation (to the posterior) & sample from it. Mixture of Gaussians: fit a mixture of Gaussians and then sample from it.
Gibbs Sampler (2) Special case of Metroplis-Hasting Sampler. Given the proposal distribution The acceptance rate is So every sample from the proposal distribution in a Gibbs sampler will be accepted.