Quick & Simple Simulation in Excel with Clinical Trials Applications

Quick & SimpleSimulation in Excelwith Clinical Trials Applications Presented to the Delaware Chapter of the American Statistical Association 20 October 2011 Dennis Sweitzer, Ph.D. www.Dennis-Sweitzer.com

Background • Occasional need for simulations • Excel is convenient, but • does not explicitly support simulations • Simulation usually requires VBA programming (so why not use R or SAS instead) • Or Add-in commercial programs (eg., @Risk) • Or some academic add-ins • Does have iterative calculations, Solver • Why not simulation?

Simulate what? • Stochastic Models • Unknown parameters? Guestimate a distribution • Optimizing policy? Test each with simulations • Sensitivity Analysis • Variations in Inputs  Variations in Outputs • 2 parameters: use a table • >2 parameters: simulate & compare variation

Excel: Pros MEGO Common Language / Common Tools • Most people understand Excel • Many tools available in Excel Transparency: Modeling assumptions can be: Specified -- Graphed -- Debated What you see is what you get!

Excel Cons Slower than in SAS, S+, R, etc Lacks some statistical/probability functions • Latest versions are a little better • Still need to add some VBA code • Known bugs in statistical routines (often fixed) Tradeoffs: • Quicker modifications vs slower execution

Simple Solution: Data Tables Excel Data Tables • Creates a table of values of a function • (ie, Random Variables) • Leftmost column is used as an argument • (which is ignored in a simulation) • Data Table repeats calculations for each row • (Each row is an iteration of the simulation)

1. Create Simulation Create Random Variables using Inverse Probability Method: For Random Variable X with distribution function F(x), F(x): Â→ [0,1] If Random Uniform UÎ [0,1] X = F-1(U) (Excel: U=Rand() )

2. Align Random Variables • Calculations can be anywhere in Spreadsheet • Reference the Variables in a row • Is best to label variables in same way

3. Select Data Table • Select table region • 1st row is Rand Vars • 1st column is not used (can label iterations) • From toolbar: • Data>Data Table

4. Create Simulation Table • Column input cell = Upper left hand corner of table • Row input cell = ignore • OK  Populates the table • (may have to manually recalcule)

5. Execute Simulation Iterative development • Simulation can be changed • Add reporting variables • Recalculate to rerun • (no need to use Data Table again, unless expanding) • Hint: debug with short table, expand for final run

The End (of the key concepts)

But still more…. • Why use inverse probability distributions (instead of random variables)? • When not to use a spreadsheet for simulation? • Tools: • Macros to set up a simulation • VBA functions for common simulation distributions

Inverse Probability Function • Most systems directly generate random variables with the desired distribution • Why use Inverse Probability Functions? • Which are (probably) slower? Personal opinion • Testing & Debugging • Verification  Calculates correctly • Validation  Calculations answer Problem • Sensitivity  Input vs Output variability

As Mapping function ⟼ F-1 U Probability Distribution: F(x): Â→ [0,1] Random Uniform: UÎ (0,1] Inverse PDF: X = F-1(U) For Continuous (or monotone) F-1 Small changes in u∈U  small changes inF-1 (u)

Mapping 2 Random Uniform Var As input to Deterministic Function

Mapping Random numbers in (should) Map to outputs in

Example #1 Simple model, function of 2 RV Saving {Ui}: • Verify • Replicate • Quantify A Max value looks high. Is it a bug? If not, how often? Saved random U[0,1] For each iteration Check u∈U[0,1] That generated high value u=0.983…  random high  Rarely happens

Example #1 (Sensitivity) Sort by U1, U2 Sensitive to U1 Insensitive to U2

Spreadsheet limitations • Only simple data structures are available • Rows & columns, no lists & trees • Discrete event simulations • Complex algorithms: difficult • Eg, While or for loops • Can improvise (cumbersome, slow, buggy) • Speed: slow • Data Storage: what-you-see-is-all-you-get

Tools: Excel Simulation Template • Adds some missing random functions • Adds some set-up macros

Macro SimulateSampler To start a new simulation when you don't remember the names & parameters of common random variables used in simulation: • Run the Macro SimulationSample • Copy, delete, and edit as needed. • Make sure all random values are referenced in the first row of the data table at the bottom.

Macro SimulationSampler • Creates a simulation with each of common simulation functions

Macro SimulationSampler ……… • Sets up header row for data table • Sets up a place for statistics

Macro Simulate • Highlight the row of random variables • (1st row of simulation table) • Run macro "Simulate” • Prompts for which will ask for the number of simulation iterations, • The default number of iterations is 100 • Debug & develop (manually recalculate) • Final run with >1000 iterations • Visual Basic code is computationally intensive,

Macro Simulate

Note bene • Run Simulate right after SimulationSampler • Risk of “Ref!” error • SimTemplate,Plot,Sampler contains • The sampler • A distribution plot of all random variable • Crude, but handy for quick comparisons • Ready to edit

SimTemplate,Plot,Sampler • Crude Distribution plot of ALL variables • Uses Percentile Ranksto save space • Good for Continuous Var. • Bad for Discrete Var. • Copy • Delete unwanted variables • Make it pretty

Excel Random Variables • Rand() --Random Uniform [0,1] • NormSInv() – Inverse Standard Normal Distribution • CriticalBinomial() – Inverse Binomial Distribution • LogNormInv() - Inverse Log Normal Distribution • Caveat: parameters are mean, SD after the Log transformation

Erlang Distribution How long do you wait until you get a predetermined number of arrivals? • Interarrival times are distributed IID exponential • Erlang is Gamma with integer parameter

Beta Distribution • Can use as • Distribution of a Binomial probability • Range = [0,1] • Generic bounded hump (vs Normal as generic unbounded hump)

Example#2, Problem Client: “Here’s our plan….” • Simple spreadsheet calculation • But only the expected value, • but not variability

Example #2, Simulation • Time to 100th patient • Patients arrive IID Exponential Summary Statistics of Simulated values (below) Interpretation: under the assumptions, 90% of simulations required more than 4.4 months

Added VBA Functions Inverse Functions Needed for Simulation • Poisson, Negative Binomial Interpolation from Table • Interpolate: 1 or 2 dimensional interpolation Convenience • Beta with Mean, SD as parameters • Beta with Hi, Low, and Mode used for parameters (often used for PERT/CPM charts) • Log Normal with mean, SD as parameters

Missing Statistical Functions • InvPoisson :: Poisson Distribution • InvPascal :: Integer valued Negative Binomial • (how many failures before k successes) Negative Binomial is continuous valued distribution; discrete version is often denoted Pascal distribution

Example#3, Patients to Screen Expected Enrollment rate = 75% ± 5% ~ Beta Distribution # Screen Failures ~ Negative Binomial (Pascal) • Depends on Enrollment Rate

Beta Distribution (2) For Convenience • Beta distribution given Mean, SD • Beta distribution given Mean, SD, upper, lower bounds • Beta distribution given Mode, Upper, Lower bounds • Sometimes used for PERT/Critical Path Analysis • 3 estimates for tasks: Optimistic, Pessimistic, Most Likely • Beta distributed time for each task • Assumes SD = 1/6 of the interval [low, high]

Simulation from a Table Simulate arbitrary distribution: • Top Row: values in [0,1] • Bottom Row: Quantiles • Result: interpolated value of U from table Or a function: y=f(x) • X is found in top row, y is interpolated from bottom row

Table Simulation Uses • Polygonal distributions (like Triangular) • Survival curve (for time to event) • Est. K-M curve from data, simulate rest of trial • Arbitrary empirical distributions • Distribution from observations

Simulation from a 2-dimensional table • Here: • Rows are quartiles of a random function • Left column is value of a parameter • A family of distributions which vary with the parameter • Parameter y=75% (can be random) • Generate random numbers from the interpolated distribution.

Example #4: Interim Review • After 2 months, review randomization rates • Continue to Randomize to 100 patients • How long?

Example#4: Interim Review (Simulation) Y= # Patients at 2 mos ~ Poisson Time to Randomize (100-Y) additional pts ~ Erlang (Gamma) 80% CI:; (2.5, 3.7) months

Clinical Trials Applications • Simulations for planning • Prototyping larger simulation • Checking assumptions/validation

Why Simulate? Expected Trial Performance • Usually not of interest -- already done w/o simulation Variability of Trial Performance • Important for Risk Management: “What’s the earliest, the latest, the most, the least, etc” • 80% CIs Structural Problems • Interactions of parameters may doom the trial before it even starts! (eg, mean (max{ X, Y} ) vs max{ mean(X), mean(Y) } )

Prototyping Prototyping: • Toy simulation with hands-on teamwork • Development model • Get team buy-in on assumptions • Processing speed not important • Rapid modifications are important Ideal? • Develop a prototype in an 1 hour meeting • Check for errors later • Run large simulations later for precise estimates

Checking planning assumptions • H0 = Simulation assumptions • Observed: a value X • {xi} = corresponding values in simulation • Rank of X in {xi} ≈ p-value Stored Values: Use Function Percent Rank Descriptive Statistics: Use Frequency Count Use to: • Test assumptions, validate model, +?? • If an observed value of X is rare in the simulation, question assumptions!

Checking Assumptions (2) Example: • A trial is designed based on a non-trivial simulation. • The model predicts a completion rate of 65% with 95% C.I.= (55%, 75%) • 4 months into the trial, a 50% completion rate is observed. • How significant is this discrepancy? Resimulate: • {xi} = simulated completion rates (1/iteration) • Rank of observed 50% in {xi} ≈ p-value • “How likely is the observation, under the modeled assumptions?”

Example #5: Simulating a 30 patient trial • Each patient is a random vari • Each patient is a random variable • Survival times are interpolated • Estimated survival curves have confidence intervals • All 30 patients in an iteration use the same random conf.level • Conf. Level is updated each iteration

Example #5: Testing Assumptions Statistics on the patients are the simulation random variables • Assume the trial was carried out • 70% of patients complete Q: is this consistent with the simulations? A: Yes, but… Only 6.1% of simulations had >70% completion

Macro Management VBA Editor: Alt-F11 (or find the menu) • Some versions of Excel • Copy Module between sheets • Copy code from .xls sheet & insert into VBA editor • Open & save as new sheet

Quick & Simple Simulation in Excel with Clinical Trials Applications