1 / 42

BAYESIAN INFERENCE Sampling techniques

BAYESIAN INFERENCE Sampling techniques. Andreas Steingötter. Motivation & Background. Exact inference is intractable, so we have to resort to some form of approximation. Motivation & Background. variational Bayes deterministic approximation not exact in principle

shiela
Download Presentation

BAYESIAN INFERENCE Sampling techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BAYESIAN INFERENCE Sampling techniques Andreas Steingötter

  2. Motivation & Background Exact inference is intractable, so we have to resort to some form of approximation

  3. Motivation & Background • variationalBayes • deterministicapproximation not exact in principle Alternative approximation: • Perform inference by numerical sampling, also known as Monte Carlo techniques.

  4. Motivation & Background Posteriordistribution is required (primarily) for the purpose of evaluating expectations . • are predictions made by model with parameters • is parameter prior and is likelihood - evaluate the marginal likelihood (evidence) for a model

  5. Motivation & Background approximation Classical Monte Carlo approx are random (not necessarily independent) draws from , which converges to the right answer in the limit of large numbers of samples, .

  6. Motivation & Background Problems: • How to obtain independent samples from • Expectation may be dominated by regions of small probability-> large sample sizes will be required to achievesufficientaccuracy • Monte Carlo ignores values of when forming the estimate ifare independent draws from , then low numbers suffice to estimate expectation

  7. Howto do sampling? • Basic Sampling algorithms • Restrictedmainlyto 1- / 2- dimensional problems • Markovchain Monte Carlo • Verygeneraland powerful framework

  8. Basic sampling Special cases • Model with directed graph • Ancestral sampling: • Easy sampling of joint distribution: • Logic sampling: • Compare sampled value for with observed value at node i. If NOT agree, then discard all previous samples and start with first node

  9. Random sampling • Computers cangenerateonlypseudorandomnumbers • Correlationofsuccessivevalues • Lack ofuniformityofdistribution • Poor dimensional distributionofoutputsequence • Distancebetweenwherecertainvaluesoccuraredistributeddifferentlyfromthose in a randomsequencedistribution

  10. Random samplingfromtheUniform Distribution • Assumption: • good pseudo-randomgeneratorforuniformlydistributeddataisimplemented • Alternative: http://www.random.org • “true”random numbers with randomness coming from atmospheric noise

  11. Random samplingfrom a standardnon-uniform distribution Goal: Sample from non-uniform distribution which is a standard distribution, i.e. given in analytical form Suppose: we have uniformly distributed random numbers from (0,1) Solution: Transform random numbers over (0,1) using a function which is the inverse of the indefinite integral of the desired distribution

  12. Random samplingfrom a standard non-uniform distribution • Step 1: Calculate cumulative distribution function • Step 2: Transform samples by

  13. Rejectionsampling Suppose: • direct sampling from is difficult, but • can be evaluated for any given value of up to some normalization constant • is unknown, canbeevaluated Approach: • Define simple proposaldistributionsuch thatfor all .

  14. Rejectionsampling • Simple visualexample • Constant k should be as small as possible. • Fraction of rejected points depends on the ratio of the area under the unnormalized distribution to the area under the curve .

  15. Rejectionsampling • Rejection sampler • Generatetworandomnumbers • numberfromproposaldistribution • generate a number from uniform distribution over • Ifreject! • Remainingpairshaveunifromdistributionunder

  16. Adaptive rejectionsampling Suppose: difficult to determine a suitable analytic form for the proposal distribution Approach: construct envelope function “on the fly” based on observed values of the distribution • if is log concave (has non-increasing derivatives) use derivatives to construct envelope

  17. Adaptive rejectionsampling • Step 1: at initial set of grid points , evaluate function and its gradient and calculate tangents at . • Step 2: sample from envelop distribution, if accepted use it to calculate , otherwiserefinegrid. • Envelopedistributionis a piecewise exponential distribution Slope  Offset k

  18. Adaptive rejectionsampling Problem ofrejectionsampling: • Find a proposaldistribution,whichisclosetorequireddistributiontominimizerejection rate. • Thereforerestrictedmainlytounivariatedistributionscurseofdimensionality • However: potential subroutine

  19. Importancesampling • Framework for approximating expectations directly with respect to • Does NOT provide Suppose (again): • direct sampling from is difficult, but • can be evaluated for any given value of up to some normalization constant

  20. Importancesampling • As forrejectionsampling, applyproposal distributionfromwhichitis easy todrawsamples

  21. Importancesampling • Expectation formulaforun-normalizeddistributionswithimportanceweights Key points: • Importance weights correct bias introduced by sampling from proposal distribution • Dependence on how well approximates (similar to rejection sampling) • Choose sample points in input space where is large (or at least where is large) • If > 0 in same region, then necessary

  22. Importancesampling Attention: • Consider none of the samples falls in the regions where is large. • In that case, the apparent variances of and may be small even though the estimate of the expectation may be severely wrong. • Hence a major drawback of the importance sampling method is the potential to produce results that are arbitrarily in error and with no diagnostic indication. • should NOT be small where may be significant!!!

  23. Markov Chain Monte Carlo (MCMC) sampling • MCMC is a generalframework, samplingfrom large classofdistributions, scaleswellwithdimensionalityof sample space Goal: Generatesamplesfromdistribution Idea: Build a machinewhichusesthecurrent sample todecidewhichnext sample toproduce in such a waythattheoveralldistributionofthesamples will be.

  24. Markov Chain Monte Carlo (MCMC) sampling Approach: • Generatea candidatesample from a proposaldistribution) that depends on the currentstateandissufficiently simple todrawsamplesfromdirectly. • Current sample isknown (i.e. maintainrecordofthecurrentstate) • Samples ,, form a Markovchain • Acceptorrejectthecandidate sample accordingtosomeappropriatecriterion

  25. MCMC - Metropolis algorithm Suppose: • can be evaluated for any given value of up to some normalization constant Algorithm: • Step 1: Choosesymmetricproposaldistribution • Step 2: Candidate sample isacceptedwithprobability

  26. MCMC - Metropolis algorithm Algorithm (cont.): • Step 2.1: Choose a randomnumberwith uniform distribution in (0,1) • Step 2.2:Acceptancetestfor • Step 3:

  27. Metropolis algorithm Notes: • rejectionof a pointsleadstotheprevious sample (different fromrejectionsampling) • If > 0 foranyvalues then tends to for ->  • , , ... presentnoindependentsamplesfrom - serialcorrelation. InsteadretainonlyeveryMth sample.

  28. Examples: Metropolis algorithm Implementation in R: • Ellipticaldistibution ) ) Update state Keep oldstate

  29. Examples: Metropolis algorithm Implementation in R: • Initialization [-2,2], stepsize = 0.3 n=1500 n=15000

  30. Examples: Metropolis algorithm Implementation in R: • Initialization [-2,2], stepsize = 0.5 n=1500 n=15000

  31. Examples: Metropolis algorithm Implementation in R: • Initialization [-2,2], stepsize = 1 n=1500 n=15000

  32. Validation of MCMC Properties ofMarkovchains: Transition probabilities: z(m+1) z(1) z(2) z(m) homogeneous Ifis the same for all m Invariant (stationary)

  33. Validation of MCMC Propoerties ofMarkovchains:: homogeneous Ifis the same for all m Invariant (stationary) Sufficient satisfy detailedbalance reversible

  34. Validation of MCMC Goal: invariant Markovchainthatconvergestodesireddistribution • An ergodicMarkovchainhasonlyoneequilibriumdistribution invariant !!! forany ergodicity

  35. Properties andvalidationof MCMC Approach: Constructappropriatetransitionprobabilities: • fromsetofbasetransitionsk • Mixture form • Successiveapplication k - Mixing coefficients

  36. Metropolis-Hastings algorithm • Generalizationof Metropolis algorithm • Nosymmetricproposaldistribution) required • Choice ofproposaldistributioncrititcal Ifsymmetry

  37. Metropolis-Hastings algorithm • Gaussiancentered on currentstate • Small variance -> high acceptance, slowwalk, dependentsamples • Large variance -> high rejection rate

  38. Gibbs sampling • Special caseof Metropolis-Hastings algorithm • therandom value is always accepted, 1 Suppose: , • Step 1: initial samples • Step 2: (repeated) • ) • ) • repeatedbycycling • randomlychoose variable tobeupdated

  39. Gibbs sampling • is invariant (unchanged) • Univariateconditionaldistribution is invariant (by definition) • Joint distribution is invariant • Because (fixed at each step)

  40. Gibbs sampling • Sufficientconditionforergodicity: • None oftheconditionaldistributionsbeanywherezero, i.e. anypoint in spacecanbereachedfromanyotherpoint in a finite numberofsteps z(2) z(3) z(1)

  41. Gibbs sampling Obtainmindependentsamples: • Sample MCMC during a «burn-in» periodtoremovedependence on initialvalues • Then, sample atset time points (e.g. everyMth sample) • The Gibbs sequence converges to a stationary (equilibrium) distribution that is independent of the starting values, • By construction this stationary distribution is the target distribution we are trying to simulate.

  42. Gibbs sampling • Practicability dependent feasibility to draw samples from conditional distributions . • Directed graphs will lead to conditional distributions for Gibbs sampling that are log concave. • Adaptive rejection sampling methods

More Related