660 likes | 821 Views
Tutorial on Bayesian Techniques for Inference. Asensio Ramos Instituto de Astrofísica de Canarias. Outline. General introduction The Bayesian approach to inference Examples Conclusions. The Big Picture. Deductive Inference. Statistical Inference. The Big Picture.
E N D
Tutorial on BayesianTechniques forInference • Asensio Ramos • Instituto de Astrofísica de Canarias
Outline • General introduction • TheBayesianapproachtoinference • Examples • Conclusions
The Big Picture DeductiveInference StatisticalInference
The Big Picture Availableinformationisalwaysincomplete Ourknowledge of natureisnecessarilyprobabilistic Cox & Jaynesdemonstratedthatprobabilitycalculus fulfillingthe rules can beusedto do statisticalinference
Probabilistic inference H1, H2, H3, …., Hn are hypothesis that we want to test The Bayesian way is to estimate p(Hi|…) and select depending on the comparison of their probabilities But… What are the p(Hi|…)???
What is probability? (Frequentist) In frequentistapproach, probability describes “randomness” If we carry out the experiment many times, which is the distribution of events (frequentist) p(x) is the histogram of random variable x
What is probability? (Bayesian) We observe this value In Bayesianapproach, probability describes “uncertainty” Everything can be a random variable as wewillseelater p(x) giveshowprobabilityisdistributedamongthepossible choice of x
Bayes theorem Itistriviallyderivedfromtheproduct rule • Hi propositionassertingthetruth of a hypothesis • I propositionrepresenting prior information • D propositionrepresenting data
Bayes theorem - Example • Model M1predicts a star at d=100 ly • Model M2predicts a star at d=200 ly • Uncertainty in measurementisGaussianwiths=40 ly • Measureddistanceis d=120 ly Likelihood Posteriors
Bayes theorem – Another example 2.3% false positive 1.4% false negative (98.6% reliability)
Bayes theorem – Another example Youtakethe test and yougetitpositive. Whatisthe probabilitythatyouhavethediseaseiftheincidenceis 1:10000? H youhavethedisease H youdon’thavethedisease D1 your test is positive
Bayes theorem – Another example 10-4 0.986 0.986 10-4 0.9999 0.023
What is usually known as inversion One proposes a model to explain observations All inversion methods work by adjusting the parameters of the model with the aim of minimizing a merit function that compares observations with the synthesis from the model Least-squares solution (maximum-likelihood) is the solution to the inversion problem
Defects of standard inversion codes • Solution is given as a set of model parameters (max. likelihood) • Not necessary the optimal solution • Sensitive to noise • Error bars or confidence regions are scarce • Gaussian errors • Not easy to propagate errors • Ambiguities, degeneracies, correlations are not detected • Assumptions are not explicit • Cannot compare models
Inversion as a probabilistic inference problem Use Bayestheoremtopropagateinformation from data toour final state of knowledge Likelihood Prior Posterior Evidence
Priors Contain information about model parameters that we know before presenting the data Assuming statistical independence for all parameters the total prior can be calculated as Gaussian prior (we know some values are more probable than others) Typical priors Top-hat function (flat prior) qi qi qmin qmax
Likelihood Assuming normal (gaussian) noise, the likelihood can be calculated as where the c2 function is defined as usual In this case, the c2 function is specific for the the case of Stokes profiles
Advantages of Bayesian approach • “Best fit” values of parameters are e.g., mode/median of the posterior • Uncertainties are credible regions of the posterior • Correlation between variables of the model are captured • Generalized error propagation (not only Gaussian and including correl.) Integration over nuissance parameters (marginalization)
Beautiful posterior distributions Field strength Field inclination Field azimuth Filling factor
Not so beautiful posterior distributions - degeneracies Field inclination
Inversion with local stray-light – be careful siisthevariance of thenumerator But… whathappensifwepropose a modellike Orozco Suárez et al. (2007) with a stray-light contaminationobtainedfrom a local averageonthesurroundingpixels Fromobservations
Variance becomes dependent on stray-light contamination It is usual to carry out inversions with a stray-light contamination obtained from a local average on the surrounding pixels
Spatial correlations: use global stray-light It is usual to carry out inversions with a stray-light contamination obtained from a local average on the surrounding pixels If M correlationstendtozero
Recommendation Use global stray-light contamination to avoid problems
Model comparison Choose among the selected models the one that is preferred by the data Posterior for model Mi Model likelihood is just the evidence
Model comparison – a worked example H0 : simple Gaussian H1 : twoGaussians of equalwidthbutunknownamplitude ratio
Model comparison – a worked example H0 : simple Gaussian H1 : twoGaussians of equalwidthbutunknownamplitude ratio
Model comparison – a worked example Model H1is 9.2 times more probable
Model comparison – an example Model 2 1 magnetic+1 non-magnetic component Model 1 1 magnetic component Model 4 2 magnetic components with (v2=0, a2=0) Model 3 2 magnetic components
Model comparison – an example Model 2 1 magnetic+1 non-magnetic component 17 free parameters Model 1 1 magnetic component 9 free parameters Model 2 is preferred by the data “Best fit with the smallest number of parameters” Model 4 2 magnetic components with (v2=0, a2=0) 18 free parameters Model 3 2 magnetic components 20 free parameters
Model averaging. One step further Models {Mi, i=1..N} have a common subset of parameters y of interest but each model depends on a different set of parameters q or have different priors over these parameters Posterior for y including all models What all models have to say about parameters y All of them give a “weighted vote”
Hierarchical models In the Bayesian approach, everything can be considered a random variable PRIOR PRIOR PAR. MODEL LIKELIHOOD DATA MARGINALIZATION NUISANCE PAR. INFERENCE
Hierarchical models In the Bayesian approach, everything can be considered a random variable PRIOR PRIOR PAR. PRIOR PRIOR MODEL LIKELIHOOD DATA MARGINALIZATION NUISANCE PAR. INFERENCE
Bayesian Weak-field Bayes theorem Advantage: everything is close to analytic
Bayesian Weak-field – Hierarchical priors Priors depend on some hyperparameters over which we can again set priors and marginalize them
Bayesian Weak-field - Data IMaX data
Bayesian Weak-field - Posteriors Joint posteriors
Bayesian Weak-field - Posteriors Marginal posteriors
Hierarchical priors – Distribution of longitudinal B Wewanttoinferthedistributionof longitudinal B from manyobservedpixelstakingintoaccountuncertainties Parameterizethedistribution in terms of a vector a Mean+variance ifGaussian Height of bins if general
Hierarchical priors – Distribution of longitudinal B Wegenerate N syntheticprofileswithnoisewith longitudinal fieldsampledfrom a Gaussiandistribution withstandarddeviation 25 Mx cm-2