1 / 37

CS b553 : A lgorithms for Optimization and Learning

CS b553 : A lgorithms for Optimization and Learning. Monte Carlo Methods for Probabilistic Inference. Agenda. Monte Carlo methods O(1/ sqrt (N)) standard deviation For Bayesian inference Likelihood weighting Gibbs sampling. Monte Carlo Integration. Estimate large integrals/sums:

komala
Download Presentation

CS b553 : A lgorithms for Optimization and Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS b553: Algorithms for Optimization and Learning Monte Carlo Methods for Probabilistic Inference

  2. Agenda • Monte Carlo methods • O(1/sqrt(N)) standard deviation • For Bayesian inference • Likelihood weighting • Gibbs sampling

  3. Monte Carlo Integration • Estimate large integrals/sums: • I =  f(x)p(x) dx • I =  f(x)p(x) • Using a sample of N i.i.d. samples from p(x) • I  1/N  f(x(i)) • Examples: • [a,b]f(x) dx  (b-a)/N Sf(x(i)) • E[X] =  x p(x) dx  1/N S x(i) • Volume of a set in Rn

  4. Mean & Variance of estimate • Let IN be the random variable denoting the estimate of the integral with N samples • What is the bias (mean error) E[I-IN]?

  5. Mean & Variance of estimate • Let IN be the random variable denoting the estimate of the integral with N samples • What is the bias (mean error) E[I-IN]? • E[I-IN]=I-E[IN] (linearity of expectation)

  6. Mean & Variance of estimate • Let IN be the random variable denoting the estimate of the integral with N samples • What is the bias (mean error) E[I-IN]? • E[I-IN]=I-E[IN] (linearity of expectation) = E[f(x)] - 1/N S E[f(x(i))] (definition of I and IN)

  7. Mean & Variance of estimate • Let IN be the random variable denoting the estimate of the integral with N samples • What is the bias (mean error) E[I-IN]? • E[I-IN]=I-E[IN] (linearity of expectation) = E[f(x)] - 1/N S E[f(x(i))] (definition of I and IN) = 1/N S(E[f(x)]-E[f(x(i))]) = 1/N S0 (x and x(i) are distributed w.r.t. p(x)) = 0

  8. Mean & Variance of estimate • Let IN be the random variable denoting the estimate of the integral with N samples • What is the bias (mean error) E[I-IN]? • Unbiased estimator • What is the variance Var[IN]?

  9. Mean & Variance of estimate • Let IN be the random variable denoting the estimate of the integral with N samples • What is the bias (mean error) E[I-IN]? • Unbiased estimator • What is the variance Var[IN]? • Var[IN] = Var[1/N S f(x(i))] (definition)

  10. Mean & Variance of estimate • Let IN be the random variable denoting the estimate of the integral with N samples • What is the bias (mean error) E[I-IN]? • Unbiased estimator • What is the variance Var[IN]? • Var[IN] = Var[1/N S f(x(i))] (definition) = 1/N2Var[Sf(x(i))] (scaling of variance)

  11. Mean & Variance of estimate • Let IN be the random variable denoting the estimate of the integral with N samples • What is the bias (mean error) E[I-IN]? • Unbiased estimator • What is the variance Var[IN]? • Var[IN] = Var[1/N S f(x(i))] (definition) = 1/N2Var[Sf(x(i))] (scaling of variance) = 1/N2SVar[f(x(i))] (variance of a sum of independent variables)

  12. Mean & Variance of estimate • Let IN be the random variable denoting the estimate of the integral with N samples • What is the bias (mean error) E[I-IN]? • Unbiased estimator • What is the variance Var[IN]? • Var[IN] = Var[1/N S f(x(i))] (definition) = 1/N2Var[Sf(x(i))] (scaling of variance) = 1/N2SVar[f(x(i))] = 1/N Var[f(x)] (i.i.d. sample)

  13. Mean & Variance of estimate • Let IN be the random variable denoting the estimate of the integral with N samples • What is the bias (mean error) E[I-IN]? • Unbiased estimator • What is the variance Var[IN]? • 1/N Var[f(x)] • Standard deviation: O(1/sqrt(N))

  14. Approximate Inference Through Sampling • Unconditional simulation: • To estimate the probability of a coin flipping heads, I can flip it a huge number of times and count the fraction of heads observed

  15. Approximate Inference Through Sampling • Unconditional simulation: • To estimate the probability of a coin flipping heads, I can flip it a huge number of times and count the fraction of heads observed • Conditional simulation: • To estimate the probability P(H) that a coin picked out of bucket B flips heads: • Repeat for i=1,…,N: • Pick a coin C out of a random bucket b(i) chosen with probability P(B) • h(i) = flip C according to probability P(H|b(i)) • Sample (h(i),b(i)) comes from distribution P(H,B) • Result approximates P(H,B)

  16. Monte Carlo Inference In Bayes Nets • BN over variables X • Repeat for i=1,…,N • In top-down order, generate x(i)as follows: • Sample xj(i) ~ P(Xj|paXj(i)) • (RHS is taken by putting parent values in sample into the CPT for Xj) • Sample x(1)…x(N) approximates the distribution over X

  17. Burglary Earthquake Alarm JohnCalls MaryCalls Approximate Inference: Monte-Carlo Simulation • Sample from the joint distribution B=0 E=0 A=0 J=1 M=0

  18. Approximate Inference: Monte-Carlo Simulation • As more samples are generated, the distribution of the samples approaches the joint distribution B=0 E=0 A=0 J=1 M=0 B=0 E=0 A=0 J=0 M=0 B=0 E=0 A=0 J=0 M=0 B=1 E=0 A=1 J=1 M=0

  19. Basic method for Handling Evidence • Inference: given evidence E=e (e.g., J=1), approximate P(X/E|E=e) • Remove the samples that conflict B=0 E=0 A=0 J=1 M=0 B=0 E=0 A=0 J=0 M=0 B=0 E=0 A=0 J=0 M=0 B=1 E=0 A=1 J=1 M=0 Distribution of remaining samples approximates the conditional distribution

  20. Rare Event Problem: • What if some events are really rare (e.g., burglary & earthquake ?) • # of samples must be huge to get a reasonable estimate • Solution: likelihood weighting • Enforce that each sample agrees with evidence • While generating a sample, keep track of the ratio of • (how likely the sampled value is to occur in the real world)(how likely you were to generate the sampled value)

  21. Burglary Earthquake Alarm JohnCalls MaryCalls Likelihood weighting • Suppose evidence Alarm & MaryCalls • Sample B,E with P=0.5 w=1

  22. Burglary Earthquake Alarm JohnCalls MaryCalls Likelihood weighting • Suppose evidence Alarm & MaryCalls • Sample B,E with P=0.5 w=0.008 B=0 E=1

  23. Burglary Earthquake Alarm JohnCalls MaryCalls Likelihood weighting • Suppose evidence Alarm & MaryCalls • Sample B,E with P=0.5 w=0.0023 B=0 E=1 A=1 A=1 is enforced, and the weight updated to reflect the likelihood that this occurs

  24. Burglary Earthquake Alarm JohnCalls MaryCalls Likelihood weighting • Suppose evidence Alarm & MaryCalls • Sample B,E with P=0.5 w=0.0016 B=0 E=1 A=1 M=1 J=1

  25. Burglary Earthquake Alarm JohnCalls MaryCalls Likelihood weighting • Suppose evidence Alarm & MaryCalls • Sample B,E with P=0.5 w=3.988 B=0 E=0

  26. Burglary Earthquake Alarm JohnCalls MaryCalls Likelihood weighting • Suppose evidence Alarm & MaryCalls • Sample B,E with P=0.5 w=0.004 B=0 E=0 A=1

  27. Burglary Earthquake Alarm JohnCalls MaryCalls Likelihood weighting • Suppose evidence Alarm & MaryCalls • Sample B,E with P=0.5 w=0.0028 B=0 E=0 A=1 M=1 J=1

  28. Burglary Earthquake Alarm JohnCalls MaryCalls Likelihood weighting • Suppose evidence Alarm & MaryCalls • Sample B,E with P=0.5 w=0.00375 B=1 E=0 A=1

  29. Burglary Earthquake Alarm JohnCalls MaryCalls Likelihood weighting • Suppose evidence Alarm & MaryCalls • Sample B,E with P=0.5 w=0.0026 B=1 E=0 A=1 M=1 J=1

  30. Burglary Earthquake Alarm JohnCalls MaryCalls Likelihood weighting • Suppose evidence Alarm & MaryCalls • Sample B,E with P=0.5 w=5e-7 B=1 E=1 A=1 M=1 J=1

  31. Likelihood weighting • Suppose evidence Alarm & MaryCalls • Sample B,E with P=0.5 • N=4 gives P(B|A,M)~=0.371 • Exact inference gives P(B|A,M) = 0.375 w=0.0016 w=0.0028 w=0.0026 w~=0 B=0 E=1 A=1 M=1 J=1 B=0 E=0 A=1 M=1 J=1 B=1 E=0 A=1 M=1 J=1 B=1 E=1 A=1 M=1 J=1

  32. Another Rare-Event Problem • B=b given as evidence • Probability each bi is rare given all but one setting of Ai(say, Ai=1) • Chance of sampling all 1’s is very low => most likelihood weights will be too low • Problem: evidence is not being used to sample A’s effectively (i.e., near P(Ai|b)) A1 A2 A10 B1 B2 B10

  33. Gibbs Sampling • Idea: reduce the computational burden of sampling from a multidimensional distribution P(x)=P(x1,…,xn) by doing repeated draws of individual attributes • Cycle through j=1,…,n • Sample xj ~ P(xj | x[1…j-1,j+1,…n]) • Over the long run, the random walk taken by x approaches the true distribution P(x)

  34. Gibbs Sampling in BNs • Each Gibbs sampling step: 1) pick a variable Xi, 2) sample xi ~ P(Xi|X/Xi) • Look at values of “Markov blanket” of Xi: • Parents PaXi • Children Y1,…,Yk • Parents of children (excluding Xi) PaY1/Xi, …,PaYk/Xi • Xi is independent of rest of network given Markov blanket • Sample xi~P(Xi|, Y1, PaY1/Xi, …, Yk, PaYk/Xi)= 1/Z P(Xi|PaXi) P(Y1|PaY1) *…* P(Yk|PaYk) • Product of Xi’s factor and the factors of its children

  35. Handling evidence • Simply set each evidence variable to its appropriate value, don’t sample • Resulting walk approximates distribution P(X/E|E=e) • Uses evidence more efficiently than likelihood weighting

  36. Gibbs sampling issues • Demonstrating correctness & convergence requires examining Markov Chain random walk (more later) • Need to take many steps before the effects of poor initialization wear off (mixing time) • Difficult to tell how much is needed a priori • Numerous variants • Known as Markov Chain Monte Carlo techniques

  37. Next time • Continuous and hybrid distributions

More Related