230 likes | 240 Views
Learn how to estimate parameters for distributions and validate simulation models using statistical tests and validation techniques.
E N D
Materials for Lecture 11 • Chapters 3 and 6 • Chapter 16 Section 4.0 and 5.0 • Lecture 11 Pseudo Random LHC.xls • Lecture 11 Validation Tests.xls • Next 4 slides were added because right about now most students are confused about PDF parameters and what functions to use
Parameter Estimation • Parameters for a distribution define the shape and position on the number scale • Uniform( Min, Max) • Norm( Mean, Std Dev) • Mean (Ỹ or Ῡ) and risk as Empirical( Si, P(Si)) • Shape can be skewed right or left, can be tall or squatty (kurtosis) • Parameters reflect amount of variability in the stochastic variable • Must validate random variables against their parameters • We use the parameters to simulate the distributions
Review Steps for Parameter Estimation • Step 1: Check for presence of a trend, cycle or structural pattern • If trend or structural model, work with the residuals (ẽt) • If no trend use actual data (X’s) • Step 2: Estimate parameters for several assumed distributions using the X’s or the residuals (ẽt) • Step 3: Simulate the different distributions • Step 4: Pick the best match based on • Mean, Variability -- use validation tests • Minimum and Maximum • Shape of the CDF vs. historical series • Penalty function CDFDEV() to quantify differences
Univariate Parameter Estimation • When do you use UPES? • When there is no trend in the data • When you want to use the historical mean as your forecasted y-hat • Test an unknown random variable for its shape • Or use residuals
Univariate Parameter Estimation • Empirical distribution fits your data best because it lets the data define the shape • Prefer to use the EMP with deviations as a percent or fraction from Y-hat • If there is a trend, then account for it with deviations from trend • Else use deviations from mean • EMP allows us to model low probability events • Test with =CDFDEV(original data, sim data)
Model Validation • Do the simulated values for the random variables reproduce their parameters? • Does the model accurately forecast the system? • Do the results conform to theoretical expectations? • Do the results conform to expectations of experts? • Touring Test of simulation model results • Show the results to experts, using alternative assumptions about the input values
Four P’s for Validation • Planning – in the initial model preparation mode, developer should plan how to validate the model • Personal – it’s the developer’s responsibility to verify every equation, coefficient, and random variable; check if results are theoretically correct? • Peers – utilize experts in the field to review model results using Touring Test; use sensitivity testing of model • Prospective Clients – do the results conform to their expectations? Are the results useful to the client?
Model Verification • Check all equations for arithmetic accuracy • Use Excel’s “Trace Dependence” functions • Check linkage of variables coming into each equation • Check model in “Expected Value” and “Stochastic” mode • Insure that the variables in each equation are theoretically correct • Make sure the model contains all of the necessary equations to calculate the KOVs
Model Validation • Use statistical tests of each random variable to insure that it: • reproduces the historical distribution • reproduces the historical correlation matrix among random variables • Statistical Tests • Student t test • F test • Chi Square test
Statistical Tests for Validation • Test the means of the random variables against their historical values • Statistically equal at 95% level based on a t-test? • Test the variance against historical values • Statistically equal at 95% level based on an F-test? • Check the historical vs. simulated coefficient of variation • Needs to be constant over time • Check the minimum and maximum • For a Normal distribution are they reasonable? Should be: Min ≈ Mean + StdDev * (-3) Max ≈ Mean + StdDev * (3) • For an Empirical distribution compare simulated min and max to values the model “should” simulate or Xmin should get = Y-hat * (1+Minimum Fractional Deviate) Xmax should get = Y-hat * (1+Maximum Fractional Deviate) • Check the correlation matrix for the simulated variables vs. the historical correlation matrix using t-tests
Validation Tests in Simetar • Verification/Validation tests in Simetar • Hypothesis tests icon • Compare Two Series Historical Data vs. Simulated Values • 1st Data Series is history • 2nd Data Series is simulated • Test means and variances for two series, i.e., are they statistically equal • Test works for a pair of variables and for comparing two multivariate distributions (matrices)
Statistical Tests for Validation • Compare Two Series Historical Data vs. Simulated Values • 1st Data Series is history • 2nd Data Series is simulated
Validation Tests in Simetar • Compare mean and standard deviation of simulated data to the user’s specified values • “Data Series” is the simulated values • Type in the mean or cell • Specify the Std Dev as a value or a cell location • The test is used when • Only mean and std dev are known, i.e., there is no history for the variable • Mean is a projected value which is different from the history
Validation Tests in Simetar • Compare mean and standard deviation of simulated data to the user’s specified values • The test is used when only mean and stddev are known, i.e., there is no history for the variable Or the mean is a projected value different from history • Note the Given Values are Mean = 10 and StdDev = 3
Validation Tests in Simetar • Test simulated values for Multivariate Distributions (MVE and MVN) to test if the historical correlation matrix is reproduced in the simulation • Data Series is the simulated values for all random variables in the MV distribution, a matrix of variables in SimData • The original correlation matrix used to simulate the MVE or MVN distribution • OK, if the majority of correlation coefficients are statistically the same as the historical correlation matrix
Charts for Validation • Test simulated values for Multivariate Distributions (MVE and MVN) to test if the historical correlation matrix is reproduced in the simulation
Using Charts for Visual Validation • Use a CDF to compare historical series to simulated series, tests the min and max • Use a PDF to compare historical series to simulated series, tests the shape • Use a Box Plot to compare historical series to simulated series, checks the variability • Use a Probability graph to compare historical series to simulated series, P(x) vs. F(x) • Use a Fan graph to show the range of the risk and level of the mean over time, visual test of CV constant over time
How Simetar Simulates Random Numbers • A pseudo random number generator is used so we can reproduce the simulation results from day to day with the same inputs • Pseudo random number generator uses a seed to start the sampling sequence • The default seed in Simetar is 31517 • Change the seed if you like • If you do not use a pseudo random number generator then every time you simulate the model you get different answers, even if the input has not changed
Latin Hyper Cube vs. Monte Carlo Simulated Numbers • Monte Carlo simulation procedure samples randomly from the full range of the possible values for a random variable • Requires large number of iterations for adequate coverage over possible range of a variable • For small number of iterations does not sample adequately
Latin Hyper Cube vs. Monte Carlo Simulated Numbers • Latin Hyper Cube systematically samples all segments of the distribution for a random variable • If 100 iterations are to be simulated, LHC samples one value randomly from each of 100 intervals of equal length on 0 to 1 USD scale • Insures all segments of distribution are sampled, even at small numbers of iterations • With LHC get “adequate” sampling coverage of a distribution with fewer iterations
Latin Hyper Cube vs. Monte Carlo • A Uniform distribution defined as U(0,1) is a straight line with a 450 angle out of the origin • A perfect sample would lie on the straight line • Use the following USDs • Excel’s =RAND() • Simetar’s =UNIFORM() • Simulate these two USDs • Draw a CDF with the two random variables, Which one lies on the straight line between 0 and 1? 1.0 F(x) 0.0 X 1.0
Example of Latin Hyper Cube vs. Monte Carlo Simulation of USD