670 likes | 768 Views
Emulation, Elicitation and Calibration. UQ12 Minitutorial Presented by: Tony O’Hagan, Peter Challenor , Ian Vernon. Outline of the minitutorial. Three sessions of about 2 hours each Session 1: Monday, 2pm – 4pm, State C
E N D
Emulation, Elicitation and Calibration UQ12 Minitutorial Presented by: Tony O’Hagan, Peter Challenor, Ian Vernon UQ12 minitutorial - session 1
Outline of the minitutorial Three sessions of about 2 hours each • Session 1: Monday, 2pm – 4pm, State C • Overview of UQ ; total UQ; introduction to emulation; elicitation • Session 2: Tuesday, 2pm – 4pm, State C • Building and using an emulator; sensitivity analysis • Session 3: Wednesday, 2pm – 4pm, State C • Calibration and history matching; galaxy formation case study Intended to introduce the applied maths/engineering UQ people to UQ methods developed in the statistics community UQ12 minitutorial - session 1
Session 1 Introduction and elicitation
Outline • Introduction • UQ and Total UQ • Managing uncertainty • A brief case study • Emulators • Elicitation • Elicitation principles • Elicitation practice UQ12 minitutorial - session 1
UQ and Total UQ UQ12 minitutorial - session 1
What is UQ? • Uncertainty quantification • A term that seems to have been devised by engineers • Faced with uncertainty in some particular kinds of analyses • Characterising how uncertainty about inputs to a complex computer model induces uncertainty about outputs • Large body of work in engineering and applied maths • Uncertainty quantification • What statisticians do! • And have always done • In every field of application, for all kinds of analyses • In particular, statisticians have developed methods for propagating and quantifying output uncertainty • And lots more relating to the use of complex simulation models UQ12 minitutorial - session 1
Simulators • In almost all fields of science, technology, industry and policy making, people use mechanistic models • For understanding, prediction, control • Huge variety • A model simulates a real-world, usually complex, phenomenon as a set of mathematical equations • Models are usually implemented as computer programs • We will refer to a computer implementation of a model as a simulator UQ12 minitutorial - session 1
Why worry about uncertainty? • Simulators are increasingly being used for decision-making • Taking very seriously the implied claim that the simulator represents and predicts reality • How accurate are model predictions? • There is growing concern about uncertainty in model outputs • Particularly where simulator predictions are used to inform scientific debate or environmental policy • Are their predictions robust enough for high stakes decision-making? UQ12 minitutorial - session 1
For instance … • Models for climate change produce different predictions for the extent of global warming or other consequences • Which ones should we believe? • What error bounds should we put around these? • Are simulator differences consistent with the error bounds? • Until we can answer such questions convincingly, why should anyone have faith in the science? UQ12 minitutorial - session 1
The simulator as a function • In order to talk about the uncertainty in model predictions we need some simple notation • Using computer language, a simulator takes a number of inputs and produces a number of outputs • We can represent any output y as a function y = f(x) of a vector x of inputs UQ12 minitutorial - session 1
Where is the uncertainty? • How might the simulator output y = f(x) differ from the true real-world value z that the simulator is supposed to predict? • Error in inputs x • Initial values • Forcing inputs • Model parameters • Error in model structure or solution • Wrong, inaccurate or incomplete science • Bugs, solution errors UQ12 minitutorial - session 1
The ideal is to provide a probability distribution p(z) for the true real-world value The centre of the distribution is a best estimate Its spread shows how much uncertainty about zis induced by uncertainties on the previous slide How do we get this? Input uncertainty: characterise p(x), propagate through to p(y) Structural uncertainty: characterise p(z–y) Quantifying uncertainty UQ12 minitutorial - session 1
More uncertainties • It is important to recognise two more uncertainties that arise when working with simulators • The act of propagating input uncertainty is imprecise • Approximations are made • Introducing additional code uncertainty • A key task in managing uncertainty is to use observations of the real world to tune or calibrate the model • We need to acknowledge uncertainty due to measurement error UQ12 minitutorial - session 1
Code uncertainty – Monte Carlo • The simplest way to propagate uncertainty is Monte Carlo • Take a large random sample of realisations from p(x) • Run the simulator at each sampled x to get a sample of outputs • This is a random sample from p(y) • E.g. sample mean estimates E(Y) • Even with a very large sample, MC computations are not exact • Sample is an approximation of the population • Standard error of sample mean is population s.d. over root n • This is code uncertainty • MC has a built-in statistical quantification of code uncertainty UQ12 minitutorial - session 1
Code uncertainty – alternatives to MC • MC is impractical for simulators that require significant resources, so other methods have been developed • Polynomial chaos methods • PC expansions are always truncated • The truncation error is where the main code uncertainty lies • Also in solving Galerkin equations • Surrogate models (e.g. emulators) • Approximations to the true f(.) • Code uncertainty lies in the approximation error UQ12 minitutorial - session 1
How to quantify uncertainty • To quantify uncertainty in the true real world value that the simulator is trying to predict we need the following steps • Quantify uncertainty in inputs, p(x) • Propagate to uncertainty in output, p(y) • Quantify and account for code uncertainty • Quantify and account for model discrepancy uncertainty • Engineering/applied maths UQ apparently only deals with the second step • Ironically, this is the one step that doesn’t actually involve quantifying uncertainty! UQ12 minitutorial - session 1
Total UQ • Here are my key demands • UQ for any quantity of interest must quantify all components of uncertainty • All UQ must be in the form of explicit, quantified probability distributions • All quantifications of uncertainty should be credible representations of what is, and is not, known • None of this is easy but we should at least try • I call these aspirations the Total UQ Manifesto UQ12 minitutorial - session 1
Managing uncertainty UQ12 minitutorial - session 1
UQ is not enough • The presence of uncertainty creates several important tasks • Engineering/applied maths UQ addresses only one of these • Managing uncertainty • Uncertainty analysis – how much uncertainty do we have? • This is the basic UQ task • Sensitivity analysis – which sources of uncertainty drive overall uncertainty, and how? • Understanding the system, prioritising research • Calibration – how can we reduce uncertainty? • Use of observations • Tuning, data assimilation, history matching, inverse problems • Experimental design UQ12 minitutorial - session 1
Decision-making under uncertainty – can we cope with uncertainty? • Robust engineering design • Optimisation under uncertainty UQ12 minitutorial - session 1
MUCM • Managing Uncertainty in Complex Models • Large 4-year UK research grant • June 2006 to September 2010 • 7 postdoctoral research associates, 4 project PhD students • Objective to develop BACCO methods into a basic technology, usable and widely applicable • MUCM2: New directions for MUCM • Smaller 2-year grant to September 2012 • Scoping and developing research proposals UQ12 minitutorial - session 1
Primary MUCM deliverables • Methodology and papers moving the technology forward • Papers both in statistics and application area journals • The MUCM toolkit • Documentation of the methods and how to use them • With emphasis on what is found to work reliably across a range of modelling areas • Web-based • Case studies • Three substantial case studies • Showcasing methods and best practice • Linked to toolkit • Events • Workshops – conceptual and hands-on • Short courses • Conferences – UCM 2010 and UCM 2012 (July 2-4) UQ12 minitutorial - session 1
Focus on the • The toolkit is a ‘recipe book’ • The good sort that encourages you to experiment • There are recipes (procedures) but also lots of explanation of concepts and discussion of choices • It is not a software package • Software packages are great if they are in your favourite language • But it probably wouldn’t be! • Packages are dangerous without basic understanding • The purpose of the toolkit is to build that understanding • And it enables you to easily develop your own code UQ12 minitutorial - session 1
Resources • Introduction to emulators • O'Hagan, A. (2006). Bayesian analysis of computer code outputs: a tutorial. Reliability Engineering and System Safety 91, 1290-1300. • The MUCM website • http://mucm.ac.uk • The MUCM toolkit • http://mucm.ac.uk/toolkit • The UCM 2012 conference • http://mucm.ac.uk/UCM2012.html UQ12 minitutorial - session 1
This minitutorial • This minitutorial covers the key elements of Total UQ and uncertainty management • Emulators • Surrogate models that include quantification of code uncertainty • Brief outline in this session then details in session 2 • Elicitation • Tools for rigorous quantification of fundamental uncertainties • Introduction to this big field in this session • Management tools • Sensitivity analysis in session 2 • Calibration and history matching in session 3 UQ12 minitutorial - session 1
A brief case study Complex emulation and expert elicitation were essential components of this exercise UQ12 minitutorial - session 1
Example: UK carbon flux in 2000 • Vegetation model predicts carbon exchange from each of 700 pixels over England & Wales in 2000 • Principal output is Net Biosphere Production • Accounting for uncertainty in inputs • Soil properties • Properties of different types of vegetation • Land usage • Also code uncertainty • But not structural uncertainty • Aggregated to England & Wales total • Allowing for correlations • Estimate 7.46 Mt C (± 0.54 Mt C) UQ12 minitutorial - session 1
Mean NBP Standard deviation Maps UQ12 minitutorial - session 1
England & Wales aggregate UQ12 minitutorial - session 1
Emulators UQ12 minitutorial - session 1
So far, so good, but • In principle, Total UQ is straightforward • In practice, there are many technical difficulties • Formulating uncertainty on inputs • Elicitation of expert judgements • Propagating input uncertainty • Modelling structural error • Anything involving observational data! • The last two are intricately linked • And computation UQ12 minitutorial - session 1
The problem of big models • Tasks like uncertainty propagation and calibration require us to run the simulator many times • Uncertainty propagation • Implicitly, we need to run f(x) at all possible x • Monte Carlo works by taking a sample of x from p(x) • Typically needs thousands of simulator runs • Calibration • Traditionally done by searching x space for good fits to the data • Both become impractical if the simulator takes more than a few seconds to run • 10,000 runs at 1 minute each takes a week of computer time • We need a more efficient technique UQ12 minitutorial - session 1
More efficient methods • This is what UQ theory is mostly about • Engineering/Applied Maths UQ • Polynomial chaos expansions of random variables • Approximate by truncating • Thereby build an expansion of outputs • Compute by Monte Carlo etc. using this surrogate representation • Statistics UQ • Gaussian process emulation of the simulator • A different kind of surrogate • Propagate input uncertainty through surrogate • By Monte Carlo or analytically UQ12 minitutorial - session 1
Gaussian process representation • More efficient approach • First work in early 1980s (DACE) • Represent the code as an unknown function • f(.) becomes a random process • We generally represent it as a Gaussian process (GP) • Or its second-order moment version • Training runs • Run simulator for sample of x values • Condition GP on observed data • Typically requires many fewer runs than Monte Carlo • And x values don’t need to be chosen randomly UQ12 minitutorial - session 1
Emulation • Analysis is completed by prior distributions for, and posterior estimation of, hyperparameters • The posterior distribution is known as an emulator of the computer simulator • Posterior mean estimates what the simulator would produce for any untried x (prediction) • With uncertainty about that prediction given by posterior variance • Correctly reproduces training data • Gets its UQ right! • An essential requirement of credible quantification UQ12 minitutorial - session 1
Consider one input and one output Emulator estimate interpolates data Emulator uncertainty grows between data points 2 code runs UQ12 minitutorial - session 1
Adding another point changes estimate and reduces uncertainty 3 code runs UQ12 minitutorial - session 1
And so on 5 code runs UQ12 minitutorial - session 1
Then what? • Given enough training data points we can in principle emulate any simulator output accurately • So that posterior variance is small “everywhere” • Typically, this can be done with orders of magnitude fewer model runs than traditional methods • At least in relatively low-dimensional problems • Use the emulator to make inference about other things of interest • E.g. uncertainty analysis, sensitivity analysis, calibration • The key feature that distinguishes an emulator from other kinds of surrogate • Code uncertainty is quantified naturally • And credibly UQ12 minitutorial - session 1
Elicitation principles UQ12 minitutorial - session 1
Where do probabilities come from? • Consider the probability distribution for a model input • Like the hydraulic conductivity Kin a geophysical model • Suppose we ask an expert, Mary • Mary gives a probability distribution for K • We might be particularly interested inone probability in that distribution • Like the probability that Kexceeds 10-3 (cm/sec) • Mary’s distribution says Pr(K> 10-3) = 0.2 UQ12 minitutorial - session 1
How can K have probabilities? • Almost everyone learning probability is taught the frequency interpretation • The probability of something is the long run relative frequency with which it occurs in a very long sequence of repetitions • How can we have repetitions of K? • It’s a one-off, and will only ever have one value • It’s that unique value we’re interested in • Mary’s distribution can’t be a probability distribution in that sense • So what do her probabilities actually mean? • And does she know? UQ12 minitutorial - session 1
Mary’s probabilities • Mary’s probability 0.3 that K > 10-3 is a judgement • She thinks it’s more likely to be below 10-3 than above • So in principle she would bet even money on it • In fact she would bet $2 to win $1 (because 0.7 > 2/3) • Her expectation of around 10-3.5 is a kind of best estimate • Not a long run average over many repetitions • Her probabilities are an expression of her beliefs • They are personal judgements • You or I would have different probabilities • We want her judgements because she’s the expert! • We need a new definition of probability UQ12 minitutorial - session 1
Subjective probability • The probability of a proposition E is a measure of a person’s degree of belief in the truth of E • If they are certain that E is true then Pr(E) = 1 • If they are certain it is false then Pr(E) = 0 • Otherwise Pr(E) lies between these two extremes • Exercise UQ12 minitutorial - session 1
Subjective includes frequency • The frequency and subjective definitions of probability are compatible • If the results of a very long sequence of repetitions are available, they agree • Frequency probability equates to the long run frequency • All observers who accept the sequence as comprising repetitions will assign that frequency as their (personal or subjective) probability for the next result in the sequence • Subjective probability extends frequency probability • But also seamlessly covers propositions that are not repeatable • It’s also more controversial UQ12 minitutorial - session 1
It doesn’t include prejudice etc! • The word “subjective” has derogatory overtones • Subjectivity should not admit prejudice, bias, superstition, wishful thinking, sloppy thinking, manipulation ... • Subjective probabilities are judgements but they should be careful, honest, informed judgements • As “objective” as possible without ducking the issue • Using best practice • Formal elicitation methods • Bayesian analysis • Probability judgements go along with all the other judgements that a scientist necessarily makes • And should be argued for in the same careful, honest and informed way UQ12 minitutorial - session 1
But people are poor probability judges • Our brains evolved to make quick decisions • Heuristics are short-cut reasoning techniques • Allow us to make good judgements quickly in familiar situations • Judgement of probability is not something that we evolved to do well • The old heuristics now produce biases • Anchoring and adjustment • Availability • Representativeness • The range-frequency compromise • Overconfidence UQ12 minitutorial - session 1
Anchoring and adjustment • When asked to make two related judgements, the second is affected by the first • The second is judged relative to the first • By adjustment away from the first judgement • The first is called the anchor • Adjustment is typically inadequate • Second response too close to the first (anchor) • Anchoring can be strong even when obviouslynot really relevant to the second question UQ12 minitutorial - session 1
Availability • The probability of an event is judged more likely if we can quickly bring to mind instances of it • Things that are more memorable are deemed more probable • High profile train accidents in the UK lead people to imagine rail travel is more risky than it really is • My judgement of the risk of dying from a particular disease will be increased if I know (of) people who have the disease or have died from it UQ12 minitutorial - session 1
Representativeness • An event is considered more probable if the components of its description fit together • Even when the juxtaposition of many components is actually improbable • “Linda is 31, single, outspoken and very bright. She studied philosophy at university and was deeply concerned with issues of discrimination and social justice. Is Linda … • “A bank teller? • “A bank teller and active in the feminist movement?” • The second is often judged more probable than the first • We are a story-telling species • This is also called the conjunction fallacy UQ12 minitutorial - session 1