980 likes | 1.09k Views
UQ: from end to end. Tony O’Hagan. Outline . Session 1: Quantifying input uncertainty Information Modelling Elicitation Coffee break: Propagation! Session 2: Model discrepancy and calibration All models are wrong Impact of model discrepancy Modelling model discrepancy.
E N D
UQ: from end to end Tony O’Hagan
Outline • Session 1: Quantifying input uncertainty • Information • Modelling • Elicitation • Coffee break: Propagation! • Session 2: Model discrepancy and calibration • All models are wrong • Impact of model discrepancy • Modelling model discrepancy UQ Summerschool 2014
UQ: from end to end Session 1: Quantifying input uncertainty
Context • You have a model • To simulate or predict some real-world process • I’ll call it a simulator • For a given use of the simulator you are unsure of the true or correct values of inputs • This uncertainty is a major component of UQ • Propagating it through the simulator is a fundamental step in UQ • We need to express that uncertainty in the form of probability distributions • But how? • I feel that this is a neglected area in UQ • Distributions assumed, often with no discussion of where they came from UQ Summerschool 2014
Focus of this session • Probability distributions for inputs • Representing the analyst’s knowledge/uncertainty • What they mean • Interpretation of probability • Where they come from • Analysis of data and/or judgement • Elicitation • Principles • Single input • Multiple inputs • Multiple experts UQ Summerschool 2014
The analyst • The distributions should represent the best knowledge of the model user about the inputs • I will refer to the model user as the analyst • They are the analyst’s responsibility • The analyst is the one who is interested in the simulator output • For a particular application • And some or all of the inputs refer specifically to that application • The analyst must own the input distributions • They should represent best knowledge • Obviously! • Anything else is unscientific • Less input uncertainty means (generally) less output uncertainty UQ Summerschool 2014
What probability? • Before we go further, we need to understand how a probability distribution represents someone’s knowledge • The question goes right to the heart of what probability means • Example: • We are interested in X = the proportion of people infected with HIV who will develop AIDS within 10 years • when treated with a new drug • X will be an input to a clinical trial simulator • To assist the pharmaceutical company in designing the drug’s development plan • Analyst Mary expresses a probability distribution for X UQ Summerschool 2014
Mary’s distribution • The stated distribution is shown on the right • It specifies how probable any particular values of X are • E.g. It says there is a probability of almost 0.7 that X is below 0.4 • And the expected value of X is 0.35 • It even gives a nontrivial probability to X being less than 0.2 • Which would represent a major reduction in HIV progression UQ Summerschool 2014
How can X have probabilities? • Almost everyone learning probability is taught the frequency interpretation • The probability of something is the long run relative frequency with which it occurs in a very long sequence of repetitions • How can we have repetitions of X? • It’s a one-off: it will only ever have one value • It’s that unique value we’re interested in • Simulator inputs are almost always like this – they’re one-off! • Mary’s distribution can’t be a probability distribution in that sense • So what do her probabilities actually mean? • And does she know? UQ Summerschool 2014
Mary’s probabilities • Mary’s probability 0.7 that X < 0.4 is a judgement • She thinks it’s more likely to be below 0.4 than above • So in principle she would bet even money on it • In fact she would bet £2 to win £1 (because 0.7 > 2/3) • Her expectation of 0.35 is a kind of best estimate • Not a long run average over many repetitions • Her probabilities are an expression of her beliefs • They are personal judgements • You or I would have different probabilities • We want her judgements because she’s the expert! • We need a new definition of probability UQ Summerschool 2014
Subjective probability • The probability of a proposition E is a measure of a person’s degree of belief in the truth of E • If they are certain that E is true then P(E) = 1 • If they are certain it is false then P(E) = 0 • Otherwise P(E) lies between these two extremes • Exercise 1 – How many Muslims in Britain? • Refer to the two questions on your sheet • The first asks for a probability • Make your own personal judgement • If you don’t already have a good feel for the probability scale, you may find it useful to think about betting odds • The second asks for another probability UQ Summerschool 2014
Subjective includes frequency • The frequency and subjective definitions of probability are compatible • If the results of a very long sequence of repetitions are available, they agree • Frequency probability equates to the long run frequency • All observers who accept the sequence as comprising repetitions will give that frequency as their (personal/subjective) probability • for the next (or any future) result in the sequence • Subjective probability extends frequency probability • But also seamlessly covers propositions that are not repeatable • It’s also more controversial UQ Summerschool 2014
It doesn’t include prejudice etc! • The word “subjective” has derogatory overtones • Subjectivity should not admit prejudice, bias, superstition, wishful thinking, sloppy thinking, manipulation ... • Subjective probabilities are judgements but they should be careful, honest, informed judgements • As “objective” as possible without ducking the issue • Using best practice • Formal elicitation methods • Bayesian analysis • Probability judgements go along with all the other judgements that a scientist necessarily makes • And should be argued for in the same careful, honest and informed way UQ Summerschool 2014
What about data? • I’ve presented the analyst’s probability distributions as a matter of pure subjective judgement – what about data? • Many possible scenarios: • X is a parameter for which there is a published value • Analyst has one or more direct experimental evaluations for X • Analyst has data relating more or less directly to X • Analyst has some hard data but also personal expertise about X • Analyst relies on personal expertise about X • Analyst seeks input from an expert on X • … UQ Summerschool 2014
The case of a published value • The published value may come with a completely characterised probability distribution for X representing uncertainty in the value • The analyst simply accepts this distribution as her own judgement • Or it may not • The analyst needs to consider the uncertainty in X around the published value P • X = P + E, where E is the error • Analyst formulates her own probability distribution for E • The published value P may simply come with a standard deviation • The analyst accepts this as one judgement about E UQ Summerschool 2014
Using data – principles • The appropriate framework for using data is Bayesian statistics • Because it delivers a probability distribution for X • Classical frequentist statistics can’t do that • Even a confidence interval is not a probability statement about X • The data are related to X through a likelihood function • Derived from a statistical model • This is combined with whatever additional knowledge the analyst may have • In the form of a prior distribution • Combination is performed by Bayes’ theorem • The result is the analyst’s posterior distribution for X UQ Summerschool 2014
Using data – practicalities • If data are highly informative about X, prior information may not matter • Use a conventional non-informative prior distribution • Otherwise the analyst formulates her own prior distribution for X • Bayesian analysis can be complex • Analyst is likely to need the services of a Bayesian statistician • The likelihood/model is also a matter of judgement! • Although I will not delve into this today UQ Summerschool 2014
Summary • We have identified several situations where distributions need to be formulated by personal judgement • No good data – analyst formulates distribution for X • Published data does not have complete characterisation of uncertainty – analyst formulates distribution for E • Data supplemented by additional expertise – analyst formulates prior distribution for X • Analyst may seek judgements of one or more experts • Rather than relying on her own • Particularly when the stakes are high • We have identified just one situation where personal judgements are not needed • Published data with completely characterised uncertainty UQ Summerschool 2014
Elicitation • The process of • representing the knowledge • of one or more persons (experts) • concerning an uncertain quantity • as a probability distribution for that quantity • Typically conducted as a dialogue between • the experts – who have substantive knowledge about the quantity (or quantities) of interest – and • a facilitator – who has expertise in the process of elicitation • ideally face to face • but may also be done by video-conference, teleconference or online UQ Summerschool 2014
Some history • The idea of formally representing uncertainty using subjective probability judgements began to be taken seriously in the 1960s • For instance, for judgement of extreme risks • Psychologists became interested • How do people make probability judgements? • What mental processes are used, and what does this tell us about the brain’s processing generally? • They found many ways that we make bad judgements • The heuristics and biases movement • And continued to look mostly at how we get it wrong • Since this told them a lot about our mental processes UQ Summerschool 2014
Meanwhile ... • Statisticians increasingly made use of subjective probabilities • Growth of Bayesian statistics • Some formal elicitation but mostly unstructured judgements • Little awareness of the work in psychology • Reinforced recently by UQ with uncertain simulator inputs • Our interests are more complex • Not really interested in single probabilities • Whole probability distributions • Multivariate distributions • We want to know how to get it right • Psychology provides almost no help with these challenges UQ Summerschool 2014
Heuristics and biases • Our brains evolved to make quick decisions • Heuristics are short-cut reasoning techniques • Allow us to make good judgements quickly in familiar situations • Judgement of probability is not something that we evolved to do well • The old heuristics now produce biases • Anchoring and adjustment • Availability • Overconfidence • And many others UQ Summerschool 2014
Anchoring and adjustment • Exercise 1 was designed to exhibit this heuristic • The probabilities should on average be different in the two groups • When asked to make two related judgements, the second is affected by the first • The second is judged relative to the first • By adjustment away from the first judgement • The first is called the anchor • Adjustment is typically inadequate • Second response too close to the first (anchor) • Anchoring can be strong even when obviouslynot really relevant to the second question • Just putting any numbers into the discussioncreates anchors • Exercise 1 UQ Summerschool 2014
Availability • The probability of an event is judged more likely if we can quickly bring to mind instances of it • Things that are more memorable are deemed more probable • High profile train accidents in the UK lead people to imagine rail travel is more risky than it really is • My judgement of the risk of dying from a particular disease will be increased if I know (of) people who have the disease or have died from it • Important for analyst to review all the evidence UQ Summerschool 2014
Overconfidence • It is generally said that experts are overconfident • When asked to give 95% intervals, say, far fewer than 95% contain the true value • Several possible explanations • Wish to demonstrate expertise • Anchoring to a central estimate • Difficulty of judging extreme events • Not thinking ‘outside the box’ • Expertise often consists of specialist heuristics • Situations we elicit judgements on are not typical • Probably over-stated as a general phenomenon • Experts can be under-confident if afraid of consequences • A matter of personality and feeling of security • Evidence of over-confidence is not from real experts making judgements on serious questions UQ Summerschool 2014
The keys to good elicitation • First, pay attention to the literature on psychology of elicitation • How you ask a question influences the answer • Second, ask about the right things • Things that experts are likely to assess most accurately • Third, prepare thoroughly • Provide help and training for experts • These are built into the SHELF system • Sheffield Elicitation Framework UQ Summerschool 2014
The SHELF system • SHELF is a package of documents and simple software to aid elicitation • General advice on conducting the elicitation • Templates for recording the elicitation • Suitable for several different basic methods • Annotated versions of the templates with detailed guidance • Some R functions for fitting distributions and providing feedback • SHELF is freely available and comments and suggestions for additions are welcomed • Developed by Tony O’Hagan and Jeremy Oakley • R functions by Jeremy • http://tonyohagan.co.uk/shelf UQ Summerschool 2014
A SHELF template • Word document • Facilitator follows acarefully constructedsequence of questions • Final step invites experts to give their own feed-back • The tertile method • One of several supported in SHELF UQ Summerschool 2014
Annotated template • For facilitator’s guidance • Advice on each fieldof the template • Ordinary text sayswhat is required ineach field • Text in brackets gives advice on how to work with experts • Text in italics says why we are doing it this way • Based on findings in psychology UQ Summerschool 2014
Let’s see how it works UQ Summerschool 2014 • SHELF templates provide a carefully structured sequence of steps • Informed by psychology and practical experience • I’ll work through these, using the following illustrative example • An Expert is asked for her judgements about the distance D between the airports of Paris Charles de Gaulle and Chicago O’Hare • in miles • She has experience of flying distances but has not flown this route before • She knows that from LHR to JFK is about 3500 miles
Credible range L to U • Expert is asked for lower and upper credible bounds • Expert would be very surprised if X was found to be below the lower credible bound or above the upper credible bound • It’s not impossible to be outside the credible range, just highly unlikely • Practical interpretation might be a probability of 1% that X is below L and 1% that it’s above U • Example • Expert sets lower bound L = 3500 • CDG to ORD surely more than LHR to JFK • Upper bound U = 5000 • Additional flying distance for CDG to ORD surely less than 1500 UQ Summerschool 2014
The median M • The value of x for which the expert judges X to be equally likely to be above or below x • Probability 0.5 (or 50%) below • and 0.5 above • Like a toss of a coin • Or chopping the range into two equallyprobable parts • If the expert were asked to choose either to bet on X < x or on X > x, he/she should have no preference • It’s a specific kind of ‘estimate’ of X • Need to think, not just go for mid-point of the credible range • Example • Expert chooses median M = 4000 L = 0, U = 1 M = 0.36 UQ Summerschool 2014
The quartiles Q1 and Q3 • The lower quartile Q1 is the p = 25% quantile • The expert judges X < x to have probability 0.25 • Like tossing two successive Heads with a coin • Equivalently, x divides the range below the median into two equi-probable parts • ‘Less than Q1’ & ‘between Q1 and M’ • Should generally be closer to M than Q1 • Similarly, upper quartile Q3 is p = 75% • Q1, M and Q3 divide the range into four equi-probable parts • Example • Expert chooses Q1 = 3850, Q3 = 4300 L = 0, U = 1 M = 0.36 Q1 = 0.25 Q3 = 0.49 UQ Summerschool 2014
Then fit a distribution • Any convenient distribution • As long as it fits the elicited summaries adequately • SHELF has software for fitting a range of standard distributions • At this point, the choice should not matter • The idea is that we have elicited enough • Any reasonable distribution choice will be similar to any other • Elicitation can never be exact • The elicited summaries are only approximate anyway • If the choice does matter • i.e. different fitted distributions give different answers to the problem for which we are doing the elicitation • We can try to remove the sensitivity by eliciting more summaries • Or involving more experts UQ Summerschool 2014
Exercise 2 • So let’s do it! • We’re going to elicit your beliefs about one of the following (you can choose!) • Number of gold medals to be won by China in 2016 Olympics • Length of the Yangtze River • Population of Beijing in 2011 • Proportion of the total world land area covered by China UQ Summerschool 2014
Do we need a facilitator? • Yes, if the simulator output is sufficiently important • A skilled facilitator is essential to get the most accurate and reliable representation of the expert’s knowledge • At least for the most influential inputs • Otherwise, no • The analyst can simply quantify her own judgements • But it’s still very useful to follow the SHELF process • In effect, the analyst interrogates herself • Playing the role of facilitator as well as that of expert UQ Summerschool 2014
Multiple inputs • Hitherto we’ve basically considered just one input X • In practice, simulators almost always have multiple inputs • Then we need to think about dependence • Two or more uncertain quantities are independent if: • When you learn something about one of them it doesn’t change your beliefs about the others • It’s a personal judgement, like everything else in elicitation! • They may be independent for one expert but not for another • Independence is nice • Independent inputs can just be elicited separately UQ Summerschool 2014
Exercise 3 • Which of the following sets of quantities would you consider independent? • The average weight B of black US males aged 40 and the average weight W of white US males aged 40 • My height H and my age A • The time T taken by the Japanese bullet train to travel from Tokyo to Kyoto and the distance D travelled • The atomic numbers of Calcium (Ca), Silver (Ag) and Arsenic (As) UQ Summerschool 2014
Eliciting dependence • If quantities are not independent we must elicit the nature and magnitude of dependence between them • Remembering that probabilities are the best summaries to elicit • Joint probabilities • Probability that X takes some x-values and Y takes some y-values • Conditional probabilities • Probability that Y takes some y-values if X takes some x-values • Much harder to think about than probabilities for a single quantity • Perhaps the simplest is the quadrant probability • Probability both X and Y are above their individual medians UQ Summerschool 2014
Median Median Bivariate quadrant probability • First elicit medians • Now elicit quadrant probability • It can’t be negative • Or more than 0.5 • Value indicates direction and strength of dependence • 0.25 if X and Y are independent • Greater if positively correlated • 0.5 if when one is above its median the other must be • Less than 0.25 if negatively correlated • Zero if they can’t both be above their medians ? 0.5 0.5 UQ Summerschool 2014
Higher dimensions • This is already hard • Just for two uncertain quantities • In order to elicit dependence in any depth we will need to elicit several more joint or conditional probabilities • More than two variables – more complex still! • Even with just three quantities... • Three pairwise bivariate distributions • With constraints • The three-way joint distribution is not implied by those, either • We can’t even visualise or draw it! • There is no clear understanding among elicitation practitioners on how to elicit dependence UQ Summerschool 2014
Avoiding the problem • It would be so much easier if the quantities we chose to elicit were independent • i.e. no dependence or correlation between them • Then eliciting a distribution for each quantity would be enough • We wouldn’t need to elicit multivariate summaries • The trick is to ask about theright quantities • Redefine inputs so they become independent • This is called elaboration • Or structuring UQ Summerschool 2014
Example – two treatment effects • A clinical trial will compare a new treatment with an existing treatment • Existing treatment effect A is relatively well known • Expert has low uncertainty • But added uncertainty due to the effects of the sample population • New treatment effect B is more uncertain • Evidence mainly from small-scale Phase III trial • A and B will not be independent • Mainly because of the trial population effect • If A is at the high end of the expert’s distribution, she would expect B also to be relatively high • Can we break this dependence with elaboration? UQ Summerschool 2014
Relative effect • In the two treatments example, note that in clinical trials attention often focuses on the relative effect R = B/A • When effect is bad, like deaths, this is called relative risk • Expert may judge R to be independent of A • Particularly if random trial effect is assumed multiplicative • If additive we might instead consider A independent of D = B – A • But this is unusual • So elicit separate distributions for R and A • The joint distribution of (A, B) is now implicit • Can be derived if needed • But often the motivating task can be rephrased in terms of (A, R) UQ Summerschool 2014
Trial effect • Instead of simple structuring with the relative risk R, we can explicitly recognise the cause of the correlation • Let T be the trial effect due to difference between the trial patients and the wider population • Let E and N be efficacies of existing and new treatments in the wider population • Then A = E x T and B = N x T • Expert may be comfortable with independence of T, E and N • With E well known, T fairly well known and N more uncertain • We now have to elicit distributions for three quantities instead of two • But can possibly assume them independent UQ Summerschool 2014
General principles • Independence or dependence are in the head of the expert • Two quantities are dependent if learning about one of them would change his/her beliefs about the other • Explore possible structures with the expert(s) • Find out how they think about these quantities • Expertise often involves learning how to break a problem down into independent components • SHELF does not yet handle multivariate elicitation • But it does include an explicit structuring step • Which we can now see is potentially very important! • Templates for some special cases expected in the next release UQ Summerschool 2014
Multiple experts • The case of multiple experts is important • When elicitation is used to provide expert input to a decision problem with substantial consequences, we generally want to use the skill of as many experts as possible • But they will all have different opinions • Different distributions • How do we aggregate them? • In order to get a single elicited distribution UQ Summerschool 2014
Aggregating expert judgements • Two approaches • Aggregate the distributions • Elicit a distribution from each expert separately • Combine them using a suitable formula • For instance, simply average them • Called ‘mathematical aggregation’ or ‘pooling’ • Aggregate the experts • Get the experts together and elicit a single distribution • Called ‘behavioural aggregation’ • Neither is without problems UQ Summerschool 2014
Multiple experts in SHELF • SHELF uses behavioural aggregation • However, distributions are first elicited from experts separately • After sharing of key information • Allows facilitator to see the range of belief before aggregation • Then experts discuss their differences • With a view to assigning an aggregate distribution • To represent what an impartial, intelligent observer might reasonably believe after seeing the experts’ judgements and hearing their discussions • Facilitator can judge whether degree of compromise is appropriate to the intervening discussion UQ Summerschool 2014
Challenges in behavioural aggregation • More psychological hazards • Group dynamic – dominant/reticent experts • Tendency to end up more confident • Block votes • Requires careful management • What to do if they can’t agree? • End up with two or more composite distributions • Need to apply mathematical pooling to these • But this is rare in practice UQ Summerschool 2014