1 / 66

Tutorial on Bayesian Techniques for Inference

Tutorial on Bayesian Techniques for Inference. Asensio Ramos Instituto de Astrofísica de Canarias. Outline. General introduction The Bayesian approach to inference Examples Conclusions. The Big Picture. Deductive Inference. Statistical Inference. The Big Picture.

Download Presentation

Tutorial on Bayesian Techniques for Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tutorial on BayesianTechniques forInference • Asensio Ramos • Instituto de Astrofísica de Canarias

  2. Outline • General introduction • TheBayesianapproachtoinference • Examples • Conclusions

  3. The Big Picture DeductiveInference StatisticalInference

  4. The Big Picture Availableinformationisalwaysincomplete Ourknowledge of natureisnecessarilyprobabilistic Cox & Jaynesdemonstratedthatprobabilitycalculus fulfillingthe rules can beusedto do statisticalinference

  5. Probabilistic inference H1, H2, H3, …., Hn are hypothesis that we want to test The Bayesian way is to estimate p(Hi|…) and select depending on the comparison of their probabilities But… What are the p(Hi|…)???

  6. What is probability? (Frequentist) In frequentistapproach, probability describes “randomness” If we carry out the experiment many times, which is the distribution of events (frequentist)  p(x) is the histogram of random variable x

  7. What is probability? (Bayesian) We observe this value In Bayesianapproach, probability describes “uncertainty” Everything can be a random variable as wewillseelater p(x) giveshowprobabilityisdistributedamongthepossible choice of x

  8. Bayes theorem Itistriviallyderivedfromtheproduct rule • Hi propositionassertingthetruth of a hypothesis • I  propositionrepresenting prior information • D  propositionrepresenting data

  9. Bayes theorem - Example • Model M1predicts a star at d=100 ly • Model M2predicts a star at d=200 ly • Uncertainty in measurementisGaussianwiths=40 ly • Measureddistanceis d=120 ly Likelihood Posteriors

  10. Bayes theorem – Another example 2.3% false positive 1.4% false negative (98.6% reliability)

  11. Bayes theorem – Another example Youtakethe test and yougetitpositive. Whatisthe probabilitythatyouhavethediseaseiftheincidenceis 1:10000? H  youhavethedisease H  youdon’thavethedisease D1  your test is positive

  12. Bayes theorem – Another example 10-4 0.986 0.986 10-4 0.9999 0.023

  13. What is usually known as inversion One proposes a model to explain observations All inversion methods work by adjusting the parameters of the model with the aim of minimizing a merit function that compares observations with the synthesis from the model Least-squares solution (maximum-likelihood) is the solution to the inversion problem

  14. Defects of standard inversion codes • Solution is given as a set of model parameters (max. likelihood) • Not necessary the optimal solution • Sensitive to noise • Error bars or confidence regions are scarce • Gaussian errors • Not easy to propagate errors • Ambiguities, degeneracies, correlations are not detected • Assumptions are not explicit • Cannot compare models

  15. Inversion as a probabilistic inference problem Use Bayestheoremtopropagateinformation from data toour final state of knowledge Likelihood Prior Posterior Evidence

  16. Priors Contain information about model parameters that we know before presenting the data Assuming statistical independence for all parameters the total prior can be calculated as Gaussian prior (we know some values are more probable than others) Typical priors Top-hat function (flat prior) qi qi qmin qmax

  17. Likelihood Assuming normal (gaussian) noise, the likelihood can be calculated as where the c2 function is defined as usual In this case, the c2 function is specific for the the case of Stokes profiles

  18. Visual example of Bayesian inference

  19. Advantages of Bayesian approach • “Best fit” values of parameters are e.g., mode/median of the posterior • Uncertainties are credible regions of the posterior • Correlation between variables of the model are captured • Generalized error propagation (not only Gaussian and including correl.) Integration over nuissance parameters (marginalization)

  20. Bayesian inference – an example Hinode

  21. Beautiful posterior distributions Field strength Field inclination Field azimuth Filling factor

  22. Not so beautiful posterior distributions - degeneracies Field inclination

  23. Inversion with local stray-light – be careful siisthevariance of thenumerator But… whathappensifwepropose a modellike Orozco Suárez et al. (2007) with a stray-light contaminationobtainedfrom a local averageonthesurroundingpixels Fromobservations

  24. Variance becomes dependent on stray-light contamination It is usual to carry out inversions with a stray-light contamination obtained from a local average on the surrounding pixels

  25. Spatial correlations: use global stray-light It is usual to carry out inversions with a stray-light contamination obtained from a local average on the surrounding pixels If M  correlationstendtozero

  26. Spatial correlations

  27. Lesson: use global stray-light contamination

  28. Recommendation Use global stray-light contamination to avoid problems

  29. But… the most general inversion method is…

  30. Model comparison Choose among the selected models the one that is preferred by the data Posterior for model Mi Model likelihood is just the evidence

  31. Model comparison (compare evidences)

  32. Model comparison – a worked example H0 : simple Gaussian H1 : twoGaussians of equalwidthbutunknownamplitude ratio

  33. Model comparison – a worked example H0 : simple Gaussian H1 : twoGaussians of equalwidthbutunknownamplitude ratio

  34. Model comparison – a worked example

  35. Model comparison – a worked example Model H1is 9.2 times more probable

  36. Model comparison – an example Model 2 1 magnetic+1 non-magnetic component Model 1 1 magnetic component Model 4 2 magnetic components with (v2=0, a2=0) Model 3 2 magnetic components

  37. Model comparison – an example Model 2 1 magnetic+1 non-magnetic component 17 free parameters Model 1 1 magnetic component 9 free parameters Model 2 is preferred by the data “Best fit with the smallest number of parameters” Model 4 2 magnetic components with (v2=0, a2=0) 18 free parameters Model 3 2 magnetic components 20 free parameters

  38. Model averaging. One step further Models {Mi, i=1..N} have a common subset of parameters y of interest but each model depends on a different set of parameters q or have different priors over these parameters Posterior for y including all models What all models have to say about parameters y All of them give a “weighted vote”

  39. Model averaging – an example

  40. Hierarchical models In the Bayesian approach, everything can be considered a random variable PRIOR PRIOR PAR. MODEL LIKELIHOOD DATA MARGINALIZATION NUISANCE PAR. INFERENCE

  41. Hierarchical models In the Bayesian approach, everything can be considered a random variable PRIOR PRIOR PAR. PRIOR PRIOR MODEL LIKELIHOOD DATA MARGINALIZATION NUISANCE PAR. INFERENCE

  42. Bayesian Weak-field Bayes theorem Advantage: everything is close to analytic

  43. Bayesian Weak-field – Hierarchical priors Priors depend on some hyperparameters over which we can again set priors and marginalize them

  44. Bayesian Weak-field - Data IMaX data

  45. Bayesian Weak-field - Posteriors Joint posteriors

  46. Bayesian Weak-field - Posteriors Marginal posteriors

  47. Hierarchical priors - Distribution of longitudinal B

  48. Hierarchical priors – Distribution of longitudinal B Wewanttoinferthedistributionof longitudinal B from manyobservedpixelstakingintoaccountuncertainties Parameterizethedistribution in terms of a vector a Mean+variance ifGaussian Height of bins if general

  49. Hierarchical priors – Distribution of longitudinal B

  50. Hierarchical priors – Distribution of longitudinal B Wegenerate N syntheticprofileswithnoisewith longitudinal fieldsampledfrom a Gaussiandistribution withstandarddeviation 25 Mx cm-2

More Related