BAYESIAN INFERENCE Sampling techniques

BAYESIAN INFERENCE Sampling techniques Andreas Steingötter

Motivation & Background Exact inference is intractable, so we have to resort to some form of approximation

Motivation & Background • variationalBayes • deterministicapproximation not exact in principle Alternative approximation: • Perform inference by numerical sampling, also known as Monte Carlo techniques.

Motivation & Background Posteriordistribution is required (primarily) for the purpose of evaluating expectations . • are predictions made by model with parameters • is parameter prior and is likelihood - evaluate the marginal likelihood (evidence) for a model

Motivation & Background approximation Classical Monte Carlo approx are random (not necessarily independent) draws from , which converges to the right answer in the limit of large numbers of samples, .

Motivation & Background Problems: • How to obtain independent samples from • Expectation may be dominated by regions of small probability-> large sample sizes will be required to achievesufficientaccuracy • Monte Carlo ignores values of when forming the estimate ifare independent draws from , then low numbers suffice to estimate expectation

Howto do sampling? • Basic Sampling algorithms • Restrictedmainlyto 1- / 2- dimensional problems • Markovchain Monte Carlo • Verygeneraland powerful framework

Basic sampling Special cases • Model with directed graph • Ancestral sampling: • Easy sampling of joint distribution: • Logic sampling: • Compare sampled value for with observed value at node i. If NOT agree, then discard all previous samples and start with first node

Random sampling • Computers cangenerateonlypseudorandomnumbers • Correlationofsuccessivevalues • Lack ofuniformityofdistribution • Poor dimensional distributionofoutputsequence • Distancebetweenwherecertainvaluesoccuraredistributeddifferentlyfromthose in a randomsequencedistribution

Random samplingfromtheUniform Distribution • Assumption: • good pseudo-randomgeneratorforuniformlydistributeddataisimplemented • Alternative: http://www.random.org • “true”random numbers with randomness coming from atmospheric noise

Random samplingfrom a standardnon-uniform distribution Goal: Sample from non-uniform distribution which is a standard distribution, i.e. given in analytical form Suppose: we have uniformly distributed random numbers from (0,1) Solution: Transform random numbers over (0,1) using a function which is the inverse of the indefinite integral of the desired distribution

Random samplingfrom a standard non-uniform distribution • Step 1: Calculate cumulative distribution function • Step 2: Transform samples by

Rejectionsampling Suppose: • direct sampling from is difficult, but • can be evaluated for any given value of up to some normalization constant • is unknown, canbeevaluated Approach: • Define simple proposaldistributionsuch thatfor all .

Rejectionsampling • Simple visualexample • Constant k should be as small as possible. • Fraction of rejected points depends on the ratio of the area under the unnormalized distribution to the area under the curve .

Rejectionsampling • Rejection sampler • Generatetworandomnumbers • numberfromproposaldistribution • generate a number from uniform distribution over • Ifreject! • Remainingpairshaveunifromdistributionunder

Adaptive rejectionsampling Suppose: difficult to determine a suitable analytic form for the proposal distribution Approach: construct envelope function “on the fly” based on observed values of the distribution • if is log concave (has non-increasing derivatives) use derivatives to construct envelope

Adaptive rejectionsampling • Step 1: at initial set of grid points , evaluate function and its gradient and calculate tangents at . • Step 2: sample from envelop distribution, if accepted use it to calculate , otherwiserefinegrid. • Envelopedistributionis a piecewise exponential distribution Slope  Offset k

Adaptive rejectionsampling Problem ofrejectionsampling: • Find a proposaldistribution,whichisclosetorequireddistributiontominimizerejection rate. • Thereforerestrictedmainlytounivariatedistributionscurseofdimensionality • However: potential subroutine

Importancesampling • Framework for approximating expectations directly with respect to • Does NOT provide Suppose (again): • direct sampling from is difficult, but • can be evaluated for any given value of up to some normalization constant

Importancesampling • As forrejectionsampling, applyproposal distributionfromwhichitis easy todrawsamples

Importancesampling • Expectation formulaforun-normalizeddistributionswithimportanceweights Key points: • Importance weights correct bias introduced by sampling from proposal distribution • Dependence on how well approximates (similar to rejection sampling) • Choose sample points in input space where is large (or at least where is large) • If > 0 in same region, then necessary

Importancesampling Attention: • Consider none of the samples falls in the regions where is large. • In that case, the apparent variances of and may be small even though the estimate of the expectation may be severely wrong. • Hence a major drawback of the importance sampling method is the potential to produce results that are arbitrarily in error and with no diagnostic indication. • should NOT be small where may be significant!!!

Markov Chain Monte Carlo (MCMC) sampling • MCMC is a generalframework, samplingfrom large classofdistributions, scaleswellwithdimensionalityof sample space Goal: Generatesamplesfromdistribution Idea: Build a machinewhichusesthecurrent sample todecidewhichnext sample toproduce in such a waythattheoveralldistributionofthesamples will be.

Markov Chain Monte Carlo (MCMC) sampling Approach: • Generatea candidatesample from a proposaldistribution) that depends on the currentstateandissufficiently simple todrawsamplesfromdirectly. • Current sample isknown (i.e. maintainrecordofthecurrentstate) • Samples ,, form a Markovchain • Acceptorrejectthecandidate sample accordingtosomeappropriatecriterion

MCMC - Metropolis algorithm Suppose: • can be evaluated for any given value of up to some normalization constant Algorithm: • Step 1: Choosesymmetricproposaldistribution • Step 2: Candidate sample isacceptedwithprobability

MCMC - Metropolis algorithm Algorithm (cont.): • Step 2.1: Choose a randomnumberwith uniform distribution in (0,1) • Step 2.2:Acceptancetestfor • Step 3:

Metropolis algorithm Notes: • rejectionof a pointsleadstotheprevious sample (different fromrejectionsampling) • If > 0 foranyvalues then tends to for ->  • , , ... presentnoindependentsamplesfrom - serialcorrelation. InsteadretainonlyeveryMth sample.

Examples: Metropolis algorithm Implementation in R: • Ellipticaldistibution ) ) Update state Keep oldstate

Examples: Metropolis algorithm Implementation in R: • Initialization [-2,2], stepsize = 0.3 n=1500 n=15000

Examples: Metropolis algorithm Implementation in R: • Initialization [-2,2], stepsize = 0.5 n=1500 n=15000

Examples: Metropolis algorithm Implementation in R: • Initialization [-2,2], stepsize = 1 n=1500 n=15000

Validation of MCMC Properties ofMarkovchains: Transition probabilities: z(m+1) z(1) z(2) z(m) homogeneous Ifis the same for all m Invariant (stationary)

Validation of MCMC Propoerties ofMarkovchains:: homogeneous Ifis the same for all m Invariant (stationary) Sufficient satisfy detailedbalance reversible

Validation of MCMC Goal: invariant Markovchainthatconvergestodesireddistribution • An ergodicMarkovchainhasonlyoneequilibriumdistribution invariant !!! forany ergodicity

Properties andvalidationof MCMC Approach: Constructappropriatetransitionprobabilities: • fromsetofbasetransitionsk • Mixture form • Successiveapplication k - Mixing coefficients

Metropolis-Hastings algorithm • Generalizationof Metropolis algorithm • Nosymmetricproposaldistribution) required • Choice ofproposaldistributioncrititcal Ifsymmetry

Metropolis-Hastings algorithm • Gaussiancentered on currentstate • Small variance -> high acceptance, slowwalk, dependentsamples • Large variance -> high rejection rate

Gibbs sampling • Special caseof Metropolis-Hastings algorithm • therandom value is always accepted, 1 Suppose: , • Step 1: initial samples • Step 2: (repeated) • ) • ) • repeatedbycycling • randomlychoose variable tobeupdated

Gibbs sampling • is invariant (unchanged) • Univariateconditionaldistribution is invariant (by definition) • Joint distribution is invariant • Because (fixed at each step)

Gibbs sampling • Sufficientconditionforergodicity: • None oftheconditionaldistributionsbeanywherezero, i.e. anypoint in spacecanbereachedfromanyotherpoint in a finite numberofsteps z(2) z(3) z(1)

Gibbs sampling Obtainmindependentsamples: • Sample MCMC during a «burn-in» periodtoremovedependence on initialvalues • Then, sample atset time points (e.g. everyMth sample) • The Gibbs sequence converges to a stationary (equilibrium) distribution that is independent of the starting values, • By construction this stationary distribution is the target distribution we are trying to simulate.

Gibbs sampling • Practicability dependent feasibility to draw samples from conditional distributions . • Directed graphs will lead to conditional distributions for Gibbs sampling that are log concave. • Adaptive rejection sampling methods

BAYESIAN INFERENCE Sampling techniques

BAYESIAN INFERENCE Sampling techniques

Presentation Transcript

Bayesian inference, Sampling and Probability Densities

Recent Advances in Bayesian Inference Techniques

Bayesian Inference

Bayesian Inference!!!

Bayesian Inference

Bayesian Inference

Bayesian Networks: Sampling Algorithms for Approximate Inference

Bayesian Inference

Bayesian Inference

Bayesian Inference

Bayesian Inference

Bayesian inference

Bayesian Inference

Bayesian inference

Bayesian Inference

Bayesian inference

Bayesian Inference

Tutorial on Bayesian Techniques for Inference

Bayesian inference

Bayesian Inference

Bayesian Inference