1 / 98

UQ: from end to end

UQ: from end to end. Tony O’Hagan. Outline . Session 1: Quantifying input uncertainty Information Modelling Elicitation Coffee break: Propagation! Session 2: Model discrepancy and calibration All models are wrong Impact of model discrepancy Modelling model discrepancy.

loring
Download Presentation

UQ: from end to end

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UQ: from end to end Tony O’Hagan

  2. Outline • Session 1: Quantifying input uncertainty • Information • Modelling • Elicitation • Coffee break: Propagation! • Session 2: Model discrepancy and calibration • All models are wrong • Impact of model discrepancy • Modelling model discrepancy UQ Summerschool 2014

  3. UQ: from end to end Session 1: Quantifying input uncertainty

  4. Context • You have a model • To simulate or predict some real-world process • I’ll call it a simulator • For a given use of the simulator you are unsure of the true or correct values of inputs • This uncertainty is a major component of UQ • Propagating it through the simulator is a fundamental step in UQ • We need to express that uncertainty in the form of probability distributions • But how? • I feel that this is a neglected area in UQ • Distributions assumed, often with no discussion of where they came from UQ Summerschool 2014

  5. Focus of this session • Probability distributions for inputs • Representing the analyst’s knowledge/uncertainty • What they mean • Interpretation of probability • Where they come from • Analysis of data and/or judgement • Elicitation • Principles • Single input • Multiple inputs • Multiple experts UQ Summerschool 2014

  6. The analyst • The distributions should represent the best knowledge of the model user about the inputs • I will refer to the model user as the analyst • They are the analyst’s responsibility • The analyst is the one who is interested in the simulator output • For a particular application • And some or all of the inputs refer specifically to that application • The analyst must own the input distributions • They should represent best knowledge • Obviously! • Anything else is unscientific • Less input uncertainty means (generally) less output uncertainty UQ Summerschool 2014

  7. What probability? • Before we go further, we need to understand how a probability distribution represents someone’s knowledge • The question goes right to the heart of what probability means • Example: • We are interested in X = the proportion of people infected with HIV who will develop AIDS within 10 years • when treated with a new drug • X will be an input to a clinical trial simulator • To assist the pharmaceutical company in designing the drug’s development plan • Analyst Mary expresses a probability distribution for X UQ Summerschool 2014

  8. Mary’s distribution • The stated distribution is shown on the right • It specifies how probable any particular values of X are • E.g. It says there is a probability of almost 0.7 that X is below 0.4 • And the expected value of X is 0.35 • It even gives a nontrivial probability to X being less than 0.2 • Which would represent a major reduction in HIV progression UQ Summerschool 2014

  9. How can X have probabilities? • Almost everyone learning probability is taught the frequency interpretation • The probability of something is the long run relative frequency with which it occurs in a very long sequence of repetitions • How can we have repetitions of X? • It’s a one-off: it will only ever have one value • It’s that unique value we’re interested in • Simulator inputs are almost always like this – they’re one-off! • Mary’s distribution can’t be a probability distribution in that sense • So what do her probabilities actually mean? • And does she know? UQ Summerschool 2014

  10. Mary’s probabilities • Mary’s probability 0.7 that X < 0.4 is a judgement • She thinks it’s more likely to be below 0.4 than above • So in principle she would bet even money on it • In fact she would bet £2 to win £1 (because 0.7 > 2/3) • Her expectation of 0.35 is a kind of best estimate • Not a long run average over many repetitions • Her probabilities are an expression of her beliefs • They are personal judgements • You or I would have different probabilities • We want her judgements because she’s the expert! • We need a new definition of probability UQ Summerschool 2014

  11. Subjective probability • The probability of a proposition E is a measure of a person’s degree of belief in the truth of E • If they are certain that E is true then P(E) = 1 • If they are certain it is false then P(E) = 0 • Otherwise P(E) lies between these two extremes • Exercise 1 – How many Muslims in Britain? • Refer to the two questions on your sheet • The first asks for a probability • Make your own personal judgement • If you don’t already have a good feel for the probability scale, you may find it useful to think about betting odds • The second asks for another probability UQ Summerschool 2014

  12. Subjective includes frequency • The frequency and subjective definitions of probability are compatible • If the results of a very long sequence of repetitions are available, they agree • Frequency probability equates to the long run frequency • All observers who accept the sequence as comprising repetitions will give that frequency as their (personal/subjective) probability • for the next (or any future) result in the sequence • Subjective probability extends frequency probability • But also seamlessly covers propositions that are not repeatable • It’s also more controversial UQ Summerschool 2014

  13. It doesn’t include prejudice etc! • The word “subjective” has derogatory overtones • Subjectivity should not admit prejudice, bias, superstition, wishful thinking, sloppy thinking, manipulation ... • Subjective probabilities are judgements but they should be careful, honest, informed judgements • As “objective” as possible without ducking the issue • Using best practice • Formal elicitation methods • Bayesian analysis • Probability judgements go along with all the other judgements that a scientist necessarily makes • And should be argued for in the same careful, honest and informed way UQ Summerschool 2014

  14. What about data? • I’ve presented the analyst’s probability distributions as a matter of pure subjective judgement – what about data? • Many possible scenarios: • X is a parameter for which there is a published value • Analyst has one or more direct experimental evaluations for X • Analyst has data relating more or less directly to X • Analyst has some hard data but also personal expertise about X • Analyst relies on personal expertise about X • Analyst seeks input from an expert on X • … UQ Summerschool 2014

  15. The case of a published value • The published value may come with a completely characterised probability distribution for X representing uncertainty in the value • The analyst simply accepts this distribution as her own judgement • Or it may not • The analyst needs to consider the uncertainty in X around the published value P • X = P + E, where E is the error • Analyst formulates her own probability distribution for E • The published value P may simply come with a standard deviation • The analyst accepts this as one judgement about E UQ Summerschool 2014

  16. Using data – principles • The appropriate framework for using data is Bayesian statistics • Because it delivers a probability distribution for X • Classical frequentist statistics can’t do that • Even a confidence interval is not a probability statement about X • The data are related to X through a likelihood function • Derived from a statistical model • This is combined with whatever additional knowledge the analyst may have • In the form of a prior distribution • Combination is performed by Bayes’ theorem • The result is the analyst’s posterior distribution for X UQ Summerschool 2014

  17. Using data – practicalities • If data are highly informative about X, prior information may not matter • Use a conventional non-informative prior distribution • Otherwise the analyst formulates her own prior distribution for X • Bayesian analysis can be complex • Analyst is likely to need the services of a Bayesian statistician • The likelihood/model is also a matter of judgement! • Although I will not delve into this today UQ Summerschool 2014

  18. Summary • We have identified several situations where distributions need to be formulated by personal judgement • No good data – analyst formulates distribution for X • Published data does not have complete characterisation of uncertainty – analyst formulates distribution for E • Data supplemented by additional expertise – analyst formulates prior distribution for X • Analyst may seek judgements of one or more experts • Rather than relying on her own • Particularly when the stakes are high • We have identified just one situation where personal judgements are not needed • Published data with completely characterised uncertainty UQ Summerschool 2014

  19. Elicitation • The process of • representing the knowledge • of one or more persons (experts) • concerning an uncertain quantity • as a probability distribution for that quantity • Typically conducted as a dialogue between • the experts – who have substantive knowledge about the quantity (or quantities) of interest – and • a facilitator – who has expertise in the process of elicitation • ideally face to face • but may also be done by video-conference, teleconference or online UQ Summerschool 2014

  20. Some history • The idea of formally representing uncertainty using subjective probability judgements began to be taken seriously in the 1960s • For instance, for judgement of extreme risks • Psychologists became interested • How do people make probability judgements? • What mental processes are used, and what does this tell us about the brain’s processing generally? • They found many ways that we make bad judgements • The heuristics and biases movement • And continued to look mostly at how we get it wrong • Since this told them a lot about our mental processes UQ Summerschool 2014

  21. Meanwhile ... • Statisticians increasingly made use of subjective probabilities • Growth of Bayesian statistics • Some formal elicitation but mostly unstructured judgements • Little awareness of the work in psychology • Reinforced recently by UQ with uncertain simulator inputs • Our interests are more complex • Not really interested in single probabilities • Whole probability distributions • Multivariate distributions • We want to know how to get it right • Psychology provides almost no help with these challenges UQ Summerschool 2014

  22. Heuristics and biases • Our brains evolved to make quick decisions • Heuristics are short-cut reasoning techniques • Allow us to make good judgements quickly in familiar situations • Judgement of probability is not something that we evolved to do well • The old heuristics now produce biases • Anchoring and adjustment • Availability • Overconfidence • And many others UQ Summerschool 2014

  23. Anchoring and adjustment • Exercise 1 was designed to exhibit this heuristic • The probabilities should on average be different in the two groups • When asked to make two related judgements, the second is affected by the first • The second is judged relative to the first • By adjustment away from the first judgement • The first is called the anchor • Adjustment is typically inadequate • Second response too close to the first (anchor) • Anchoring can be strong even when obviouslynot really relevant to the second question • Just putting any numbers into the discussioncreates anchors • Exercise 1 UQ Summerschool 2014

  24. Availability • The probability of an event is judged more likely if we can quickly bring to mind instances of it • Things that are more memorable are deemed more probable • High profile train accidents in the UK lead people to imagine rail travel is more risky than it really is • My judgement of the risk of dying from a particular disease will be increased if I know (of) people who have the disease or have died from it • Important for analyst to review all the evidence UQ Summerschool 2014

  25. Overconfidence • It is generally said that experts are overconfident • When asked to give 95% intervals, say, far fewer than 95% contain the true value • Several possible explanations • Wish to demonstrate expertise • Anchoring to a central estimate • Difficulty of judging extreme events • Not thinking ‘outside the box’ • Expertise often consists of specialist heuristics • Situations we elicit judgements on are not typical • Probably over-stated as a general phenomenon • Experts can be under-confident if afraid of consequences • A matter of personality and feeling of security • Evidence of over-confidence is not from real experts making judgements on serious questions UQ Summerschool 2014

  26. The keys to good elicitation • First, pay attention to the literature on psychology of elicitation • How you ask a question influences the answer • Second, ask about the right things • Things that experts are likely to assess most accurately • Third, prepare thoroughly • Provide help and training for experts • These are built into the SHELF system • Sheffield Elicitation Framework UQ Summerschool 2014

  27. The SHELF system • SHELF is a package of documents and simple software to aid elicitation • General advice on conducting the elicitation • Templates for recording the elicitation • Suitable for several different basic methods • Annotated versions of the templates with detailed guidance • Some R functions for fitting distributions and providing feedback • SHELF is freely available and comments and suggestions for additions are welcomed • Developed by Tony O’Hagan and Jeremy Oakley • R functions by Jeremy • http://tonyohagan.co.uk/shelf UQ Summerschool 2014

  28. A SHELF template • Word document • Facilitator follows acarefully constructedsequence of questions • Final step invites experts to give their own feed-back • The tertile method • One of several supported in SHELF UQ Summerschool 2014

  29. Annotated template • For facilitator’s guidance • Advice on each fieldof the template • Ordinary text sayswhat is required ineach field • Text in brackets gives advice on how to work with experts • Text in italics says why we are doing it this way • Based on findings in psychology UQ Summerschool 2014

  30. Let’s see how it works UQ Summerschool 2014 • SHELF templates provide a carefully structured sequence of steps • Informed by psychology and practical experience • I’ll work through these, using the following illustrative example • An Expert is asked for her judgements about the distance D between the airports of Paris Charles de Gaulle and Chicago O’Hare • in miles • She has experience of flying distances but has not flown this route before • She knows that from LHR to JFK is about 3500 miles

  31. Credible range L to U • Expert is asked for lower and upper credible bounds • Expert would be very surprised if X was found to be below the lower credible bound or above the upper credible bound • It’s not impossible to be outside the credible range, just highly unlikely • Practical interpretation might be a probability of 1% that X is below L and 1% that it’s above U • Example • Expert sets lower bound L = 3500 • CDG to ORD surely more than LHR to JFK • Upper bound U = 5000 • Additional flying distance for CDG to ORD surely less than 1500 UQ Summerschool 2014

  32. The median M • The value of x for which the expert judges X to be equally likely to be above or below x • Probability 0.5 (or 50%) below • and 0.5 above • Like a toss of a coin • Or chopping the range into two equallyprobable parts • If the expert were asked to choose either to bet on X < x or on X > x, he/she should have no preference • It’s a specific kind of ‘estimate’ of X • Need to think, not just go for mid-point of the credible range • Example • Expert chooses median M = 4000 L = 0, U = 1 M = 0.36 UQ Summerschool 2014

  33. The quartiles Q1 and Q3 • The lower quartile Q1 is the p = 25% quantile • The expert judges X < x to have probability 0.25 • Like tossing two successive Heads with a coin • Equivalently, x divides the range below the median into two equi-probable parts • ‘Less than Q1’ & ‘between Q1 and M’ • Should generally be closer to M than Q1 • Similarly, upper quartile Q3 is p = 75% • Q1, M and Q3 divide the range into four equi-probable parts • Example • Expert chooses Q1 = 3850, Q3 = 4300 L = 0, U = 1 M = 0.36 Q1 = 0.25 Q3 = 0.49 UQ Summerschool 2014

  34. Then fit a distribution • Any convenient distribution • As long as it fits the elicited summaries adequately • SHELF has software for fitting a range of standard distributions • At this point, the choice should not matter • The idea is that we have elicited enough • Any reasonable distribution choice will be similar to any other • Elicitation can never be exact • The elicited summaries are only approximate anyway • If the choice does matter • i.e. different fitted distributions give different answers to the problem for which we are doing the elicitation • We can try to remove the sensitivity by eliciting more summaries • Or involving more experts UQ Summerschool 2014

  35. Exercise 2 • So let’s do it! • We’re going to elicit your beliefs about one of the following (you can choose!) • Number of gold medals to be won by China in 2016 Olympics • Length of the Yangtze River • Population of Beijing in 2011 • Proportion of the total world land area covered by China UQ Summerschool 2014

  36. Do we need a facilitator? • Yes, if the simulator output is sufficiently important • A skilled facilitator is essential to get the most accurate and reliable representation of the expert’s knowledge • At least for the most influential inputs • Otherwise, no • The analyst can simply quantify her own judgements • But it’s still very useful to follow the SHELF process • In effect, the analyst interrogates herself • Playing the role of facilitator as well as that of expert UQ Summerschool 2014

  37. Multiple inputs • Hitherto we’ve basically considered just one input X • In practice, simulators almost always have multiple inputs • Then we need to think about dependence • Two or more uncertain quantities are independent if: • When you learn something about one of them it doesn’t change your beliefs about the others • It’s a personal judgement, like everything else in elicitation! • They may be independent for one expert but not for another • Independence is nice • Independent inputs can just be elicited separately UQ Summerschool 2014

  38. Exercise 3 • Which of the following sets of quantities would you consider independent? • The average weight B of black US males aged 40 and the average weight W of white US males aged 40 • My height H and my age A • The time T taken by the Japanese bullet train to travel from Tokyo to Kyoto and the distance D travelled • The atomic numbers of Calcium (Ca), Silver (Ag) and Arsenic (As) UQ Summerschool 2014

  39. Eliciting dependence • If quantities are not independent we must elicit the nature and magnitude of dependence between them • Remembering that probabilities are the best summaries to elicit • Joint probabilities • Probability that X takes some x-values and Y takes some y-values • Conditional probabilities • Probability that Y takes some y-values if X takes some x-values • Much harder to think about than probabilities for a single quantity • Perhaps the simplest is the quadrant probability • Probability both X and Y are above their individual medians UQ Summerschool 2014

  40. Median Median Bivariate quadrant probability • First elicit medians • Now elicit quadrant probability • It can’t be negative • Or more than 0.5 • Value indicates direction and strength of dependence • 0.25 if X and Y are independent • Greater if positively correlated • 0.5 if when one is above its median the other must be • Less than 0.25 if negatively correlated • Zero if they can’t both be above their medians ? 0.5 0.5 UQ Summerschool 2014

  41. Higher dimensions • This is already hard • Just for two uncertain quantities • In order to elicit dependence in any depth we will need to elicit several more joint or conditional probabilities • More than two variables – more complex still! • Even with just three quantities... • Three pairwise bivariate distributions • With constraints • The three-way joint distribution is not implied by those, either • We can’t even visualise or draw it! • There is no clear understanding among elicitation practitioners on how to elicit dependence UQ Summerschool 2014

  42. Avoiding the problem • It would be so much easier if the quantities we chose to elicit were independent • i.e. no dependence or correlation between them • Then eliciting a distribution for each quantity would be enough • We wouldn’t need to elicit multivariate summaries • The trick is to ask about theright quantities • Redefine inputs so they become independent • This is called elaboration • Or structuring UQ Summerschool 2014

  43. Example – two treatment effects • A clinical trial will compare a new treatment with an existing treatment • Existing treatment effect A is relatively well known • Expert has low uncertainty • But added uncertainty due to the effects of the sample population • New treatment effect B is more uncertain • Evidence mainly from small-scale Phase III trial • A and B will not be independent • Mainly because of the trial population effect • If A is at the high end of the expert’s distribution, she would expect B also to be relatively high • Can we break this dependence with elaboration? UQ Summerschool 2014

  44. Relative effect • In the two treatments example, note that in clinical trials attention often focuses on the relative effect R = B/A • When effect is bad, like deaths, this is called relative risk • Expert may judge R to be independent of A • Particularly if random trial effect is assumed multiplicative • If additive we might instead consider A independent of D = B – A • But this is unusual • So elicit separate distributions for R and A • The joint distribution of (A, B) is now implicit • Can be derived if needed • But often the motivating task can be rephrased in terms of (A, R) UQ Summerschool 2014

  45. Trial effect • Instead of simple structuring with the relative risk R, we can explicitly recognise the cause of the correlation • Let T be the trial effect due to difference between the trial patients and the wider population • Let E and N be efficacies of existing and new treatments in the wider population • Then A = E x T and B = N x T • Expert may be comfortable with independence of T, E and N • With E well known, T fairly well known and N more uncertain • We now have to elicit distributions for three quantities instead of two • But can possibly assume them independent UQ Summerschool 2014

  46. General principles • Independence or dependence are in the head of the expert • Two quantities are dependent if learning about one of them would change his/her beliefs about the other • Explore possible structures with the expert(s) • Find out how they think about these quantities • Expertise often involves learning how to break a problem down into independent components • SHELF does not yet handle multivariate elicitation • But it does include an explicit structuring step • Which we can now see is potentially very important! • Templates for some special cases expected in the next release UQ Summerschool 2014

  47. Multiple experts • The case of multiple experts is important • When elicitation is used to provide expert input to a decision problem with substantial consequences, we generally want to use the skill of as many experts as possible • But they will all have different opinions • Different distributions • How do we aggregate them? • In order to get a single elicited distribution UQ Summerschool 2014

  48. Aggregating expert judgements • Two approaches • Aggregate the distributions • Elicit a distribution from each expert separately • Combine them using a suitable formula • For instance, simply average them • Called ‘mathematical aggregation’ or ‘pooling’ • Aggregate the experts • Get the experts together and elicit a single distribution • Called ‘behavioural aggregation’ • Neither is without problems UQ Summerschool 2014

  49. Multiple experts in SHELF • SHELF uses behavioural aggregation • However, distributions are first elicited from experts separately • After sharing of key information • Allows facilitator to see the range of belief before aggregation • Then experts discuss their differences • With a view to assigning an aggregate distribution • To represent what an impartial, intelligent observer might reasonably believe after seeing the experts’ judgements and hearing their discussions • Facilitator can judge whether degree of compromise is appropriate to the intervening discussion UQ Summerschool 2014

  50. Challenges in behavioural aggregation • More psychological hazards • Group dynamic – dominant/reticent experts • Tendency to end up more confident • Block votes • Requires careful management • What to do if they can’t agree? • End up with two or more composite distributions • Need to apply mathematical pooling to these • But this is rare in practice UQ Summerschool 2014

More Related