290 likes | 426 Views
Simulation and Uncertainty. Tony O’Hagan University of Sheffield. Outline. Uncertainty Example – bovine tuberculosis Uncertainty analysis Elicitation Case study 1 – inhibiting platelet aggregation Propagating uncertainty Case study 2 – cost-effectiveness Conclusions.
E N D
Simulation and Uncertainty Tony O’Hagan University of Sheffield I-Sim Workshop, Fontainebleau
Outline • Uncertainty • Example – bovine tuberculosis • Uncertainty analysis • Elicitation • Case study 1 – inhibiting platelet aggregation • Propagating uncertainty • Case study 2 – cost-effectiveness • Conclusions I-Sim Workshop, Fontainebleau
Two kinds of uncertainty • Aleatory (randomness) • Number of heads in 10 tosses of a fair coin • Mean of a sample of 25 from a N(0,1) distribution • Epistemic (lack of knowledge) • Atomic weight of Ruthenium • Number of deaths at Agincourt • Often, both arise together • Number of patients who respond to a drug in a trial • Mean height of a sample of 25 men in Fontainebleau I-Sim Workshop, Fontainebleau
Two kinds of probability • Frequency probability • Long run frequency in many repetitions • Appropriate only for purely aleatory uncertainty • Subjective (or personal) probability • Degree of belief • Appropriate for both aleatory and epistemic (and mixed) uncertainties • Consider, for instance • Probability that next president of USA is Republican I-Sim Workshop, Fontainebleau
Uncertainty and statistics • Data are random • Repeatable • Parameters are uncertain but not random • Unique • Uncertainty in data is mixed • But aleatory if we condition on (fix) the parameters • E.g. likelihood function • Uncertainty in parameters is epistemic • If we condition on the data, nothing aleatory remains I-Sim Workshop, Fontainebleau
Two kinds of statistics • Frequentist • Based on frequency probability • Confidence intervals, significance tests etc • Inferences valid only in long run repetition • Does not make probability statements about parameters • Bayesian • Based on personal probability • Inferences conditional on the actual data obtained • Makes probability statements about parameters I-Sim Workshop, Fontainebleau
Example: bovine tuberculosis • Consider a model for the spread of tuberculosis (TB) in cows • In the UK, TB is primarily spread by badgers • Model in order to assess reduction of TB in cows if we introduce local culling (i.e. killing) of badgers I-Sim Workshop, Fontainebleau
How the model might look • Simulation model components • Location of badger setts, litter size and fecundity • Spread of badgers • Rates of transmission of disease • Success rate of culling I-Sim Workshop, Fontainebleau
Uncertainty in the TB model • Simulation • Replicate runs give different outcomes (aleatory) • Parameter uncertainty • E.g. mean (and distribution of) litter size, dispersal range, transmission rates (epistemic) • Structural uncertainty • Alternative modelling assumptions (epistemic) • Interest in properties of simulation distribution • E.g. probability of reducing bovine TB incidence below threshold (with optimal culling) • All are functions of parameters and model structure I-Sim Workshop, Fontainebleau
General structure • Uncertain model parameters (structure) X • With known distribution • True value XT • Object of interest YT = Y(XT) • Possibly optimised over control parameters • Model output Z(X), related to Y(X) • E.g. Z(X) = Y(X) + error • Can run model for any X • Uncertainty about YT due to two sources • We don’t know XT (epistemic) • Even if we knew XT,can only observe Z(XT) (aleatory) I-Sim Workshop, Fontainebleau
Uncertainty analysis • Find the distribution of YT • Challenges: • Specifying distribution of X • Computing Z(X) • Identifying distribution of Z(X) given Y(X) • Propagating uncertainty in X I-Sim Workshop, Fontainebleau
Parameter distributions • Necessarily personal • Even if we have data • E.g. sample of badger litter sizes • Expert judgement generally plays a part • May be formal or informal • Formal elicitation of expert knowledge • A seriously non-trivial business • Substantial body of literature, particularly in psychology I-Sim Workshop, Fontainebleau
Case study 1 • A pharmaceutical company is developing a new drug to reduce platelet aggregation for patients with acute coronary syndrome (ACS) • Primary comparator is clopidogrel • Case study concerns elicitation of expert knowledge prior to reporting of Phase 2a trial • Required in order to do Bayesian clinical trial simulation • 5 elicitation sessions with several experts over a total of about 3 days • Analysis revisited after Phase 2a and 2b trials I-Sim Workshop, Fontainebleau
Generate mean IPA for each drug Randomise to new/clopidogrel Patient enters Generate IPA-SVE relationship Generate patient IPA Generate whether patient has SVE Patient loop Simulating SVEs SVE = Secondary vascular event IPA = Inhibition of platelet aggregation I-Sim Workshop, Fontainebleau
Distributions elicited • Many distributions were actually elicited • Mean IPA (efficacy on biomarker) for each drug and dose • Patient-level variation in IPA around mean • Relative risk of SVE conditional on individual patient IPA • Baseline SVE risk • Other things to do with side effects • We will just look here at elicitation of the distribution of mean IPA for a high dose of the new drug • Judgements made at the time • Knowledge now is of course quite different! • But decisions had to be made then about Phase 2b trial • Whether to go ahead or drop the drug • Size of sample, how many doses, etc I-Sim Workshop, Fontainebleau
Elicitation record I-Sim Workshop, Fontainebleau
Eliciting one distribution • Mean IPA (%) for high dose • Range: 80 to 100 • Median: 92 • Probabilities: P(over 95) = 0.4, P(under 85) = 0.2 • Chosen distribution: Beta(11.5, 1.2) • Median 93 • P(over 95) = 0.36, P(under 85) = 0.20, P(under 80) = 0.11 I-Sim Workshop, Fontainebleau
Propagating uncertainty • Usual approach is by Monte Carlo • Randomly draw parameter sets Xi, i = 1, 2, …, N from distribution of X • Run model for each parameter set to get outputs Yi = Y(Xi), i = 1, 2, …, N • Assume for now that we can do big enough runs to ignore the difference between Z(X) and Y(X) • These are a sample from distribution of YT • Use sample to make inferences about this distribution • Generally frequentist but fundamentally epistemic • Impractical if computing each Yi is computationally intensive I-Sim Workshop, Fontainebleau
Optimal balance of resources • Consider the situation where each Z(Xi) is an average over n individuals • And Y(Xi) could be got by using very large n • Then total computing effort is Nn individuals • Simulation within simulation • Suppose • The variance between individuals is v • The variance of Y(X) is w • We are interested in E(Y(X)) and w • Then optimally n = 1 + v/w (approx) • Of order 36 times more efficient than large n I-Sim Workshop, Fontainebleau
Emulation • When even this efficiency gain is not enough • Or when we the conditions don’t hold • We may be able to propagate uncertainty through emulation • An emulator is a statistical model/approximation for the function Y(X) • Trained on a set of model runs Yi = Y(Xi) or Zi = Z(Xi) • But Xis not chosen randomly (inference is now Bayesian) • Runs much faster than the original simulator • Think neural net or response surface, but better! I-Sim Workshop, Fontainebleau
Gaussian process • The emulator represents Y(.) as a Gaussian process • Prior distribution embodies only a belief that Y(X) is a smooth, continuous function of X • Condition on training set to get posterior GP • Posterior mean function is a fast approximation to Y(.) • Posterior variance expresses additional uncertainty • Unlike neural net or response surface, the GP emulator correctly encodes the training data I-Sim Workshop, Fontainebleau
2 code runs • Consider one input and one output • Emulator estimate interpolates data • Emulator uncertainty grows between data points I-Sim Workshop, Fontainebleau
3 code runs • Adding another point changes estimate and reduces uncertainty I-Sim Workshop, Fontainebleau
5 code runs • And so on I-Sim Workshop, Fontainebleau
Then what? • Given enough training data points we can emulate any model accurately • So that posterior variance is small “everywhere” • Typically, this can be done with orders of magnitude fewer model runs than traditional methods • Use the emulator to make inference about other things of interest • E.g. uncertainty analysis, calibration, optimisation • Conceptually very straightforward in the Bayesian framework • But of course can be computationally hard I-Sim Workshop, Fontainebleau
Case study 2 • Clinical trial simulation coupled to economic model • Simulation within simulation • Outer simulation of clinical trials, producing trial outcome results • In the form of posterior distributions for drug efficacy • Incorporating parameter uncertainty • Inner simulation of cost-effectiveness (NICE decision) • For each trial outcome simulate patient outcomes with those efficacy distributions (and many other uncertain parameters) • Like the “optimal balance of resources” slide • But complex clinical trial simulation replaces simply drawing from distribution of X I-Sim Workshop, Fontainebleau
Emulator solution • 5 emulators built • Means and variances of (population mean) incremental costs and QALYs, and their covariance • Together these characterised the Cost Effectiveness Acceptability Curve • Which was basically our Y(X) • For any given trial design and drug development protocols, we could assess the uncertainty (due to all causes) regarding whether the final Phase 3 trial would produce good enough results for the drug to be • Licensed for use • Adopted as cost-effective by the UK National Health Service I-Sim Workshop, Fontainebleau
Conclusions • The distinction between epistemic and aleatory uncertainty is useful • Recognising that uncertainty about parameters of a model (and structural assumptions) is epistemic is useful • Expert judgement is an integral part of specifying distributions • Uncertainty analysis of a stochastic simulation model is conceptually a nested simulation • Optimal balance of sample sizes • More efficient computation using emulators I-Sim Workshop, Fontainebleau
References • On elicitation • O’Hagan, A. et al (2006). Uncertain Judgements: Eliciting Expert Probabilities. Wiley • www.shef.ac.uk/beep • On optimal resource allocation • O’Hagan, A., Stevenson, M.D. and Madan, J. (2007). Monte Carlo probabilistic sensitivity analysis for patient level simulation models: Efficient estimation of mean and variance using ANOVA. Health Economics (in press) • Download from tonyohagan.co.uk/academic • On emulators • O'Hagan, A. (2006). Bayesian analysis of computer code outputs: a tutorial. Reliability Engineering and System Safety91, 1290-1300. • mucm.group.shef.ac.uk I-Sim Workshop, Fontainebleau