300 likes | 402 Views
Estimation , Variation and Uncertainty. Simon French simon.french@warwick.ac.uk. Aims of Session. gain a greater understanding of the estimation of parameters and variables. gain an appreciation of point estimation.
E N D
Estimation, Variation and Uncertainty Simon French simon.french@warwick.ac.uk
Aims of Session • gain a greater understanding of the estimation of parameters and variables. • gain an appreciation of point estimation. • gain an appreciation of how to assess the uncertainty and confidence levels in estimates
Cynefin and statistics Uniqueevents exploratoryanalyses Repeatable events Events? Estimation andconfirmatoryanalysis
Frequentist Statistics Key point: Probability represents a long run frequency of occurrence
Frequentist Statistics • Scientific Method is based upon repeatability of experiments • Parameters in a (scientific) model or theory are fixed • Cannot talk of the probability of a objective quantity or parameter value • Data come from repeatable experiments • Can talk of the probability of a data value
Measurement and Variation of Objective Quantities • Ideally we simply perform an experiment and measure the quantities that interest us • But variation and experimental error mean that we cannot simply do this • So we need to make multiple measurements, learn about the variation and estimate the quantity of interest
Estimation Try to find a function of the data that is tightly distributed about the quantity of interest. Distribution of data datapoint Quantity of interest, Distribution of mean Data mean Quantity of interest,
Confidence intervals intervals defined from the data 95% confidence intervals: calculate interval for each of 100 data sets about 95 will contain .
Uncertainty • But there is more uncertainty in what we do than just variation and experimental error • We do our calculations in a statistical model. • But the model is not the real world • So there is modelling error – which covers a multitude of sins!
Uncertainty • So a 95% confidence interval may represent a much greater uncertainty! • Studies have shown that the uncertainty bounds given by scientists (and others!) are often overconfident by a factor of 10.
Estimation of model parameters • Sometimes the quantities that we wish to estimate do not exist! • Parameters may only have existence within a model • Transfer coefficients • Release height in atmospheric dispersion • Risk aversion
Why do we want estimates? • [Remember our exhortations that you should be clear on your research objectives or questions.] • To measure ‘something out there’ • To find the parameter to use for some purpose in a model • Evaluation of systems • Prediction of some effect • May use estimate of parameters and their uncertainty to predict how a complex systems may evolve, e.g. through Monte Carlo Methods.
Independence • Many estimation methods assume that each error is probabilistically independent of the other errors… and often they are far from independent. • 1700 2 ‘independent’ samples • IPCC work on climate change • Dependence in data changes – increases!- the uncertainty in the estimates
Rev. Thomas Bayes • 1701?-1761 • Main work published posthumously:T. Bayes (1763) An essay towards solving a problem in the doctrine of chances. Phil Trans Roy. Soc. 53 370-418 • Bayes Theorem – inverse probability
Bayes theorem Posterior probability likelihood prior probability p(| x) p(x | ) × p()
Bayes theorem Posterior probability likelihood prior probability p(| x) p(x | ) × p() There is a constant, but‘easy’ to find as probabilityadds (integrates) to one
Bayes theorem Posterior probability likelihood prior probability p(| x) p(x | ) × p() Probability distribution of parameters p()
Bayes theorem Posterior probability likelihood prior probability p(| x) p(x | ) × p() likelihood of datagivenparameters p(x|)
Bayes theorem Posterior probability likelihood prior probability p(| x) p(x | ) × p() Probability distributionof parametersgiven data p(|x)
On the treatment of negative intensity measurements Simon Frenchsimon.french@warwick.ac.uk
Crystallography data • Roughly, x-rays shone at a crystal diffract into many rays radiating out in a fixed pattern from the crystal. • The intensities of these diffracted rays are related to the modulus of the coefficients in the Fourier expansion of the electron density of molecule. • So getting hold of the intensities gives structural information
Intensity measurement • Measure X-ray intensity in a diffracted ray and subtract the background ‘near to it’ Measured intensity, I= ray strength - background • But in protein crystallography most intensities are small relative to background so some are ‘measured’ as negative • And theory says they are non-negative … • Approaches in the early 1970s simply set negative measurements to zero … and got biased data sets
A Bayesian approach • Good reason to think the likelihood for intensity measurements is near normal • Difference of Poisson (‘counting statistics’) • Further ‘corrections’ • Theory gives the prior: “Wilson’s statistics” (AJC Wilson 1949) • Estimate with the posterior mean Normal Likelihood Wilson’s Statistics
Simon French and Keith Wilson (1978) On the treatment of negative intensity measurements ActaCrystallographicaA34, 517-525
Bayes theorem Toss a biased coin 12 times; obtain 9 heads Prior Posterior
Bayesian Estimation Toss a biased coin 12 times; obtain 9 heads Prior Take mean, median or mode Posterior
Bayesian confidence interval Toss a biased coin 12 times; obtain 9 heads Prior Highest 95% density Posterior
But why do any of these? Just report the posterior. It encodes all that is known about 1