Tutorial on Bayesian Techniques for Inference

Tutorial on BayesianTechniques forInference • Asensio Ramos • Instituto de Astrofísica de Canarias

Outline • General introduction • TheBayesianapproachtoinference • Examples • Conclusions

The Big Picture DeductiveInference StatisticalInference

The Big Picture Availableinformationisalwaysincomplete Ourknowledge of natureisnecessarilyprobabilistic Cox & Jaynesdemonstratedthatprobabilitycalculus fulfillingthe rules can beusedto do statisticalinference

Probabilistic inference H1, H2, H3, …., Hn are hypothesis that we want to test The Bayesian way is to estimate p(Hi|…) and select depending on the comparison of their probabilities But… What are the p(Hi|…)???

What is probability? (Frequentist) In frequentistapproach, probability describes “randomness” If we carry out the experiment many times, which is the distribution of events (frequentist)  p(x) is the histogram of random variable x

What is probability? (Bayesian) We observe this value In Bayesianapproach, probability describes “uncertainty” Everything can be a random variable as wewillseelater p(x) giveshowprobabilityisdistributedamongthepossible choice of x

Bayes theorem Itistriviallyderivedfromtheproduct rule • Hi propositionassertingthetruth of a hypothesis • I  propositionrepresenting prior information • D  propositionrepresenting data

Bayes theorem - Example • Model M1predicts a star at d=100 ly • Model M2predicts a star at d=200 ly • Uncertainty in measurementisGaussianwiths=40 ly • Measureddistanceis d=120 ly Likelihood Posteriors

Bayes theorem – Another example 2.3% false positive 1.4% false negative (98.6% reliability)

Bayes theorem – Another example Youtakethe test and yougetitpositive. Whatisthe probabilitythatyouhavethediseaseiftheincidenceis 1:10000? H  youhavethedisease H  youdon’thavethedisease D1  your test is positive

Bayes theorem – Another example 10-4 0.986 0.986 10-4 0.9999 0.023

What is usually known as inversion One proposes a model to explain observations All inversion methods work by adjusting the parameters of the model with the aim of minimizing a merit function that compares observations with the synthesis from the model Least-squares solution (maximum-likelihood) is the solution to the inversion problem

Defects of standard inversion codes • Solution is given as a set of model parameters (max. likelihood) • Not necessary the optimal solution • Sensitive to noise • Error bars or confidence regions are scarce • Gaussian errors • Not easy to propagate errors • Ambiguities, degeneracies, correlations are not detected • Assumptions are not explicit • Cannot compare models

Inversion as a probabilistic inference problem Use Bayestheoremtopropagateinformation from data toour final state of knowledge Likelihood Prior Posterior Evidence

Priors Contain information about model parameters that we know before presenting the data Assuming statistical independence for all parameters the total prior can be calculated as Gaussian prior (we know some values are more probable than others) Typical priors Top-hat function (flat prior) qi qi qmin qmax

Likelihood Assuming normal (gaussian) noise, the likelihood can be calculated as where the c2 function is defined as usual In this case, the c2 function is specific for the the case of Stokes profiles

Visual example of Bayesian inference

Advantages of Bayesian approach • “Best fit” values of parameters are e.g., mode/median of the posterior • Uncertainties are credible regions of the posterior • Correlation between variables of the model are captured • Generalized error propagation (not only Gaussian and including correl.) Integration over nuissance parameters (marginalization)

Bayesian inference – an example Hinode

Beautiful posterior distributions Field strength Field inclination Field azimuth Filling factor

Not so beautiful posterior distributions - degeneracies Field inclination

Inversion with local stray-light – be careful siisthevariance of thenumerator But… whathappensifwepropose a modellike Orozco Suárez et al. (2007) with a stray-light contaminationobtainedfrom a local averageonthesurroundingpixels Fromobservations

Variance becomes dependent on stray-light contamination It is usual to carry out inversions with a stray-light contamination obtained from a local average on the surrounding pixels

Spatial correlations: use global stray-light It is usual to carry out inversions with a stray-light contamination obtained from a local average on the surrounding pixels If M  correlationstendtozero

Spatial correlations

Lesson: use global stray-light contamination

Recommendation Use global stray-light contamination to avoid problems

But… the most general inversion method is…

Model comparison Choose among the selected models the one that is preferred by the data Posterior for model Mi Model likelihood is just the evidence

Model comparison (compare evidences)

Model comparison – a worked example H0 : simple Gaussian H1 : twoGaussians of equalwidthbutunknownamplitude ratio

Model comparison – a worked example

Model comparison – a worked example Model H1is 9.2 times more probable

Model comparison – an example Model 2 1 magnetic+1 non-magnetic component Model 1 1 magnetic component Model 4 2 magnetic components with (v2=0, a2=0) Model 3 2 magnetic components

Model comparison – an example Model 2 1 magnetic+1 non-magnetic component 17 free parameters Model 1 1 magnetic component 9 free parameters Model 2 is preferred by the data “Best fit with the smallest number of parameters” Model 4 2 magnetic components with (v2=0, a2=0) 18 free parameters Model 3 2 magnetic components 20 free parameters

Model averaging. One step further Models {Mi, i=1..N} have a common subset of parameters y of interest but each model depends on a different set of parameters q or have different priors over these parameters Posterior for y including all models What all models have to say about parameters y All of them give a “weighted vote”

Model averaging – an example

Hierarchical models In the Bayesian approach, everything can be considered a random variable PRIOR PRIOR PAR. MODEL LIKELIHOOD DATA MARGINALIZATION NUISANCE PAR. INFERENCE

Hierarchical models In the Bayesian approach, everything can be considered a random variable PRIOR PRIOR PAR. PRIOR PRIOR MODEL LIKELIHOOD DATA MARGINALIZATION NUISANCE PAR. INFERENCE

Bayesian Weak-field Bayes theorem Advantage: everything is close to analytic

Bayesian Weak-field – Hierarchical priors Priors depend on some hyperparameters over which we can again set priors and marginalize them

Bayesian Weak-field - Data IMaX data

Bayesian Weak-field - Posteriors Joint posteriors

Bayesian Weak-field - Posteriors Marginal posteriors

Hierarchical priors - Distribution of longitudinal B

Hierarchical priors – Distribution of longitudinal B Wewanttoinferthedistributionof longitudinal B from manyobservedpixelstakingintoaccountuncertainties Parameterizethedistribution in terms of a vector a Mean+variance ifGaussian Height of bins if general

Hierarchical priors – Distribution of longitudinal B

Hierarchical priors – Distribution of longitudinal B Wegenerate N syntheticprofileswithnoisewith longitudinal fieldsampledfrom a Gaussiandistribution withstandarddeviation 25 Mx cm-2

Tutorial on Bayesian Techniques for Inference

Tutorial on Bayesian Techniques for Inference

Presentation Transcript

Recent Advances in Bayesian Inference Techniques

Bayesian Inference

Bayesian Inference!!!

Bayesian Inference

Bayesian Inference

BAYESIAN INFERENCE Sampling techniques

Bayesian Inference

Bayesian Inference

Bayesian Inference

Bayesian Inference

Bayesian inference

Bayesian Inference

Bayesian inference

Bayesian Inference

Bayesian inference

Bayesian Inference

Bayesian inference

Bayesian Inference

Bayesian Inference

Bayesian inference

On Distributing Bayesian Inference