810 likes | 1.52k Views
11 - Markov Chains. Jim Vallandingham. Outline. Irreducible Markov Chains Outline of Proof of Convergence to Stationary Distribution Convergence Example Reversible Markov Chain Monte Carlo Methods Hastings-Metropolis Algorithm Gibbs Sampling Simulated Annealing Absorbing Markov Chains.
E N D
11 - Markov Chains Jim Vallandingham
Outline • Irreducible Markov Chains • Outline of Proof of Convergence to Stationary Distribution • Convergence Example • Reversible Markov Chain • Monte Carlo Methods • Hastings-Metropolis Algorithm • Gibbs Sampling • Simulated Annealing • Absorbing Markov Chains
Stationary Distribution • As approaches Each row is the stationary distribution
Stationary Dist. Example • Long Term averages: • 24% time spent in state E1 • 39% time spent in state E2 • 21% time spent in state E3 • 17% time spent in state E4
Stationary Distribution • Any finite, aperiodic irreducible Markov chain will converge to a stationary distribution • Regardless of starting distribution • Outline of Proof requires linear algebra • Appendix B.19
L.A. : Eigenvalues • Let P be an s x s matrix. • P has s eigenvalues • Found as the s solutions to • Assume all eigenvalues of P are distinct
L.A. : left & right eigenvectors • Corresponding to each eigenvalue • Is a right eigenvector - • And a left eigenvector - • For which: • Assume they are normalized:
L.A. : Spectral Expansion • Can express P in terms of its eigenvectors and eigenvalues: • Called a spectral expansion of P
L.A. : Spectral Expansion • If is an eigenvalue of P with corresponding left and right eigenvectors & • Then is an eigenvalue of Pn with same left and right eigenvectors &
L.A. : Spectral Expansion • Implies spectral expansion of Pn can be written as:
Outline of Proof • Going back to proof… • P is transition matrix for finite aperiodic irreducible Markov chain • P has one eigenvalue, equal to 1 • All other eigenvalues have absolute value < 1
Outline of Proof • Choosing left and right eigenvectors of • Requirements: • Also satisfies : & = 1 Probability vector (sum to 1) Normalization (definition of left eigenvector as eigenvalue of 1)
Outline of Proof • Also: • Can be shown that there is a unique solution of this equation that also satisfies so so that Same equation satisfied by the stationary distribution
Outline of Proof • Pn gives the n-step transition probabilities. • Spectral Expansion of Pn is: • So as n increases Pn approaches Only one eigenvalue is = 1. Rest are < 1
Convergence Example Has Eigenvalues of :
Convergence Example Has Eigenvalues of : Less than 1
Convergence Example • Left & Right eigenvectors satisfying
Convergence Example • Left & Right eigenvectors satisfying Stationary distribution
Convergence Example • Spectral expansion Stationary distribution 0 0 0
Reversible Markov Chains • Typically moving forward in ‘time’ in a Markov chain • 1 2 3 … t • What about moving backward in this chain? • t t-1 t-2 … 1
Reversible Markov Chains Ancestor Back in time Forward in time Species A Species B
Reversible Markov Chains • Have a finite irreducible aperiodic Markov chain • with stationary distribution • During t transitions, chain will move through states: • Reverse chain • Define • Then reverse chain will move through states:
Reversible Markov Chains • Want to show structure determining the reverse chain sequence is also a Markov chain • Typical element found from typical element of P, using:
Reversible Markov Chains • Shown by using Bayes rule to invert conditional probability • Intuitively: • The future is independent of the past, given the present • The past is independent of the future, given the present
Reversible Markov Chains • Stationary distribution of reverse chain is still • Follows from Stationary distribution property
Reversible Markov Chains • Markov chain is said to be reversible if • This only holds if
Markov Chain Monte Carlo • Class of algorithms for sampling from probability distributions • Involve constructing a Markov Chain • Want to have stationary distribution • State of chain after large number of steps is used as a sample of desired distribution • We discuss 2 algorithms • Gibbs Sampling • Simulated Annealing
Basic Problem • Find transition matrix P such that • Its stationary distribution is the target distribution • Know that Markov chain will converge to stationary distribution, regardless of initial distribution • How can we find such a P with its stationary distribution as the target distribution?
Basic Idea • Construct transition matrix Q • “candidate generating matrix” • Modify to have correct stationary distribution • Modification involves inserting factors • So that Various ways to picking a’s
Hastings-Metropolis • Goal: construct aperiodic irreducible Markov chain • Having prescribed stationary distribution • Produces a correlated sequence of draws from the target density that may be difficult to sample using a classical independence method.
Hastings-Metropolis Process: • Choose set of constants • Such that • And • Define Accept state change Reject state change Chain doesn’t change value
Hastings-Metropolis Example = (.4 .6) Q =
Hastings-Metropolis Example = (.4 .6) Q = P=
Hastings-Metropolis Example = (.4 .6) P= P2= P50=
Algorithmic Description • Start with State E1, then iterate • Propose E’ from q(Et,E’) • Calculate ratio • If a > 1, • Accept E(t+1) = E’ • Else • Accept with probability of a • If rejected, E(t+1) = Et
Gibbs Sampling Definitions Be the random vector Be the distribution of Assume We define a Markov chain whose states are the possible values of Y
Gibbs Sampling Process • Enumerate vectors in some order • 1, 2,…,s • Pick vector j with jth state in chain • pij : • 0 : if vectors i & j differ by more than 1 component If they differ by at most 1 component, y1*
Gibbs Sampling • Assume Joint distribution p(X,Y) • Looking to sample k values of X • Begin with value of y0 • Sample xi using p(X | Y = yi-1) • Once xi is found use it to find yi • p(Y | X = xi) • Repeat k times
Gibbs Sampling • Allows us to deal with univariate conditional distributions • Instead of complex joint distributions • Chain has stationary distribution of
Why is is Hastings-Metropolis ? • If we define • Can see that for Gibbs: • When a is always 1
Simulated Annealing • Goal: Find (approximate) minimum of some positive function • Function defined on an extremely large number of states, s • And to find those states where this function is minimized • Value of the function for state is:
Simulated Annealing Process • Construct neighborhood of each state • Set of states “close” to the state • Variable in Markov chain can move to a neighbor in one step • Moves outside neighborhood not allowed
Simulated Annealing • Requirements of neighborhood • If is in neighborhood of then is in the neighborhood of • Number of states in a neighborhood (N) is independent of that state • Neighborhoods are linked so that chain can eventually make it from any Ej to any Em. • If in state Ej, then the next move must be in neighborhood of Ej.