420 likes | 607 Views
BAYESIAN INFERENCE Sampling techniques. Andreas Steingötter. Motivation & Background. Exact inference is intractable, so we have to resort to some form of approximation. Motivation & Background. variational Bayes deterministic approximation not exact in principle
E N D
BAYESIAN INFERENCE Sampling techniques Andreas Steingötter
Motivation & Background Exact inference is intractable, so we have to resort to some form of approximation
Motivation & Background • variationalBayes • deterministicapproximation not exact in principle Alternative approximation: • Perform inference by numerical sampling, also known as Monte Carlo techniques.
Motivation & Background Posteriordistribution is required (primarily) for the purpose of evaluating expectations . • are predictions made by model with parameters • is parameter prior and is likelihood - evaluate the marginal likelihood (evidence) for a model
Motivation & Background approximation Classical Monte Carlo approx are random (not necessarily independent) draws from , which converges to the right answer in the limit of large numbers of samples, .
Motivation & Background Problems: • How to obtain independent samples from • Expectation may be dominated by regions of small probability-> large sample sizes will be required to achievesufficientaccuracy • Monte Carlo ignores values of when forming the estimate ifare independent draws from , then low numbers suffice to estimate expectation
Howto do sampling? • Basic Sampling algorithms • Restrictedmainlyto 1- / 2- dimensional problems • Markovchain Monte Carlo • Verygeneraland powerful framework
Basic sampling Special cases • Model with directed graph • Ancestral sampling: • Easy sampling of joint distribution: • Logic sampling: • Compare sampled value for with observed value at node i. If NOT agree, then discard all previous samples and start with first node
Random sampling • Computers cangenerateonlypseudorandomnumbers • Correlationofsuccessivevalues • Lack ofuniformityofdistribution • Poor dimensional distributionofoutputsequence • Distancebetweenwherecertainvaluesoccuraredistributeddifferentlyfromthose in a randomsequencedistribution
Random samplingfromtheUniform Distribution • Assumption: • good pseudo-randomgeneratorforuniformlydistributeddataisimplemented • Alternative: http://www.random.org • “true”random numbers with randomness coming from atmospheric noise
Random samplingfrom a standardnon-uniform distribution Goal: Sample from non-uniform distribution which is a standard distribution, i.e. given in analytical form Suppose: we have uniformly distributed random numbers from (0,1) Solution: Transform random numbers over (0,1) using a function which is the inverse of the indefinite integral of the desired distribution
Random samplingfrom a standard non-uniform distribution • Step 1: Calculate cumulative distribution function • Step 2: Transform samples by
Rejectionsampling Suppose: • direct sampling from is difficult, but • can be evaluated for any given value of up to some normalization constant • is unknown, canbeevaluated Approach: • Define simple proposaldistributionsuch thatfor all .
Rejectionsampling • Simple visualexample • Constant k should be as small as possible. • Fraction of rejected points depends on the ratio of the area under the unnormalized distribution to the area under the curve .
Rejectionsampling • Rejection sampler • Generatetworandomnumbers • numberfromproposaldistribution • generate a number from uniform distribution over • Ifreject! • Remainingpairshaveunifromdistributionunder
Adaptive rejectionsampling Suppose: difficult to determine a suitable analytic form for the proposal distribution Approach: construct envelope function “on the fly” based on observed values of the distribution • if is log concave (has non-increasing derivatives) use derivatives to construct envelope
Adaptive rejectionsampling • Step 1: at initial set of grid points , evaluate function and its gradient and calculate tangents at . • Step 2: sample from envelop distribution, if accepted use it to calculate , otherwiserefinegrid. • Envelopedistributionis a piecewise exponential distribution Slope Offset k
Adaptive rejectionsampling Problem ofrejectionsampling: • Find a proposaldistribution,whichisclosetorequireddistributiontominimizerejection rate. • Thereforerestrictedmainlytounivariatedistributionscurseofdimensionality • However: potential subroutine
Importancesampling • Framework for approximating expectations directly with respect to • Does NOT provide Suppose (again): • direct sampling from is difficult, but • can be evaluated for any given value of up to some normalization constant
Importancesampling • As forrejectionsampling, applyproposal distributionfromwhichitis easy todrawsamples
Importancesampling • Expectation formulaforun-normalizeddistributionswithimportanceweights Key points: • Importance weights correct bias introduced by sampling from proposal distribution • Dependence on how well approximates (similar to rejection sampling) • Choose sample points in input space where is large (or at least where is large) • If > 0 in same region, then necessary
Importancesampling Attention: • Consider none of the samples falls in the regions where is large. • In that case, the apparent variances of and may be small even though the estimate of the expectation may be severely wrong. • Hence a major drawback of the importance sampling method is the potential to produce results that are arbitrarily in error and with no diagnostic indication. • should NOT be small where may be significant!!!
Markov Chain Monte Carlo (MCMC) sampling • MCMC is a generalframework, samplingfrom large classofdistributions, scaleswellwithdimensionalityof sample space Goal: Generatesamplesfromdistribution Idea: Build a machinewhichusesthecurrent sample todecidewhichnext sample toproduce in such a waythattheoveralldistributionofthesamples will be.
Markov Chain Monte Carlo (MCMC) sampling Approach: • Generatea candidatesample from a proposaldistribution) that depends on the currentstateandissufficiently simple todrawsamplesfromdirectly. • Current sample isknown (i.e. maintainrecordofthecurrentstate) • Samples ,, form a Markovchain • Acceptorrejectthecandidate sample accordingtosomeappropriatecriterion
MCMC - Metropolis algorithm Suppose: • can be evaluated for any given value of up to some normalization constant Algorithm: • Step 1: Choosesymmetricproposaldistribution • Step 2: Candidate sample isacceptedwithprobability
MCMC - Metropolis algorithm Algorithm (cont.): • Step 2.1: Choose a randomnumberwith uniform distribution in (0,1) • Step 2.2:Acceptancetestfor • Step 3:
Metropolis algorithm Notes: • rejectionof a pointsleadstotheprevious sample (different fromrejectionsampling) • If > 0 foranyvalues then tends to for -> • , , ... presentnoindependentsamplesfrom - serialcorrelation. InsteadretainonlyeveryMth sample.
Examples: Metropolis algorithm Implementation in R: • Ellipticaldistibution ) ) Update state Keep oldstate
Examples: Metropolis algorithm Implementation in R: • Initialization [-2,2], stepsize = 0.3 n=1500 n=15000
Examples: Metropolis algorithm Implementation in R: • Initialization [-2,2], stepsize = 0.5 n=1500 n=15000
Examples: Metropolis algorithm Implementation in R: • Initialization [-2,2], stepsize = 1 n=1500 n=15000
Validation of MCMC Properties ofMarkovchains: Transition probabilities: z(m+1) z(1) z(2) z(m) homogeneous Ifis the same for all m Invariant (stationary)
Validation of MCMC Propoerties ofMarkovchains:: homogeneous Ifis the same for all m Invariant (stationary) Sufficient satisfy detailedbalance reversible
Validation of MCMC Goal: invariant Markovchainthatconvergestodesireddistribution • An ergodicMarkovchainhasonlyoneequilibriumdistribution invariant !!! forany ergodicity
Properties andvalidationof MCMC Approach: Constructappropriatetransitionprobabilities: • fromsetofbasetransitionsk • Mixture form • Successiveapplication k - Mixing coefficients
Metropolis-Hastings algorithm • Generalizationof Metropolis algorithm • Nosymmetricproposaldistribution) required • Choice ofproposaldistributioncrititcal Ifsymmetry
Metropolis-Hastings algorithm • Gaussiancentered on currentstate • Small variance -> high acceptance, slowwalk, dependentsamples • Large variance -> high rejection rate
Gibbs sampling • Special caseof Metropolis-Hastings algorithm • therandom value is always accepted, 1 Suppose: , • Step 1: initial samples • Step 2: (repeated) • ) • ) • repeatedbycycling • randomlychoose variable tobeupdated
Gibbs sampling • is invariant (unchanged) • Univariateconditionaldistribution is invariant (by definition) • Joint distribution is invariant • Because (fixed at each step)
Gibbs sampling • Sufficientconditionforergodicity: • None oftheconditionaldistributionsbeanywherezero, i.e. anypoint in spacecanbereachedfromanyotherpoint in a finite numberofsteps z(2) z(3) z(1)
Gibbs sampling Obtainmindependentsamples: • Sample MCMC during a «burn-in» periodtoremovedependence on initialvalues • Then, sample atset time points (e.g. everyMth sample) • The Gibbs sequence converges to a stationary (equilibrium) distribution that is independent of the starting values, • By construction this stationary distribution is the target distribution we are trying to simulate.
Gibbs sampling • Practicability dependent feasibility to draw samples from conditional distributions . • Directed graphs will lead to conditional distributions for Gibbs sampling that are log concave. • Adaptive rejection sampling methods