530 likes | 565 Views
Discrete Event Simulation - 10. Design of Simulation Experiments: Analysis of Output. Discrete Event Simulation - 10.
E N D
Discrete Event Simulation - 10 Design of Simulation Experiments: Analysis of Output.
Discrete Event Simulation - 10 This lecture concerns itself with methods for obtaining information about response variables: we run the simulator one or more times with some choices of parameters, and we need to collect statistics about some response variable. We will then use these statistics to compute desired parameters associated with the variable.
Discrete Event Simulation - 10 Confidence Limits for the Mean. How do you determine, from a sample, what the mean and variance of the population are? This assumes that the size of the sample is not “very large” - compared with the size of the population - and that we don’t have information about the population. Theorem: if u is normally distributed with 0 mean and unit variance, and v2 has a c2 distribution with n degrees of freedom , and u and v are independently distributed, then the variable has a Student’s t distribution with n degrees of freedom.
Discrete Event Simulation - 10 How do we use this? Obtain a sample {x1, x2, …, xn} of size n. If the population is known to be normally distributed, we can use the theorem on the previous slide: if s is the sample variance, is the sample mean, and s is the (unknown) population variance, satisfy the hypotheses of the theorem. Thus possesses a t-distribution with n - 1 degrees of freedom. Notice that any explicit reference to the population variance disappears from the formulae.
Discrete Event Simulation - 10 If we want to determine the confidence interval for the (unknown) population mean, at whatever confidence level we desire, all we need to do is: A) look up the appropriate t(a/2,n-1) (a refers to the size of the critical region - the complement of the confidence level); B) solve the inequality: for m: We thus have a way of estimating the mean of a population from a sample.
Discrete Event Simulation - 10 If we assume the population variance to be known, life is even simpler - always under the assumption that the original population is normal. Solve for m:
Discrete Event Simulation - 10 In both cases, the formulae allow us to set desired confidence interval and then determine how large a sample we need: In the first case we can choose some e> 0 and all we need to do next is choose n so that
Discrete Event Simulation - 10 In the second case one needs to keep computing sequentially: the sample variance s2 has to be recomputed every time, until One may want to recall that, when n is large, the Student’s t distribution is well approximated by a normal distribution. It is when n is small (small sample statistics) that the difference between the two becomes important.
Discrete Event Simulation - 10 Confidence Limits for the Variance. Assume we have run our simulator and collected a number of samples, all of size n. Each of these samples lets us compute a sample variance . The population itself has an unknown variance . Can we determine a confidence interval for ? We recall that the distribution of the sample variances satisfies the condition that is a c2 variable with n - 1 degrees of freedom.
Discrete Event Simulation - 10 We want to solve the inequality to get Taking square roots, we have confidence intervals for standard deviations. The text has a somewhat garbled formula for the probability estimate for a binomial distribution...
Discrete Event Simulation - 10 We introduce an example that finds a confidence interval for the difference of two means. Assume we have 20 plots of land yielding corn - in bushels. The plots are half treated and half not treated with a fertilizer. Treated 6.2 5.7 6.5 6 6.3 5.8 5.7 6 6 5.8 Untreated 5.6 5.9 5.6 5.7 5.8 5.7 6 5.5 5.7 5.5 We assume the population variances to be the same.
Discrete Event Simulation - 10 The assumption of equal means gives: A competing hypothesis that mx > my with a right tail of 0.005 has a t-value t = 2.878 (18 degrees of freedom), and so the result is significant and the null hypothesis has to be rejected. If the assumptions of normality and equality of variances are reasonable, we would like a confidence interval on the difference of the means. At the 0.05 confidence level:
Discrete Event Simulation - 10 Let us now estimate a confidence interval for the variance. Treated 6.2 5.7 6.5 6 6.3 5.8 5.7 6 6 5.8 Untreated 5.6 5.9 5.6 5.7 5.8 5.7 6 5.5 5.7 5.5 We will assume, for the sake of computation, that the two populations are merged into one - so we can think about having 20 individual samples. We will start with an initial sample of 5 observations and compute the successive confidence intervals.
Discrete Event Simulation - 10 a=0.05 Yields n L U U-L 6.2 5.7 6.5 6 6.3 5 .005 11.1 .484 .002 .041 .039 5.6 6 .180 12.8 .831 .070 1.08 1.01 5.9 7 .045 14.4 1.24 .019 .218 .200 5.6 8 .180 16.0 1.69 .079 .746 .667 5.7 9 .125 17.5 2.18 .057 .459 .402 5.8 10 .080 19.0 2.70 .038 .267 .229 5.8 11 .080 20.5 3.25 .039 .246 .207 5.7 12 .125 21.9 3.82 .063 .360 .298 6 13 .020 23.3 4.40 .010 .054 .044 6 14 .020 24.7 5.01 .011 .052 .041 5.8 15 .080 26.1 5.63 .043 .199 .156 5.7 16 .125 27.5 6.26 .068 .299 .231 6 17 .020 28.8 6.91 .011 .046 .035 5.5 18 .245 30.2 7.56 .138 .551 .413 5.7 19 .125 31.5 8.23 .071 .273 .202 5.5 20 .245 32.8 8.91 .142 .523 .381
Discrete Event Simulation - 10 Taking square roots we have the Standard Deviation Confidence Intervals:
Discrete Event Simulation - 10 The conclusion is probably unclear, except that one would suspect that the hypothesis of equal variances might be unwarranted. We should have probably checked for equal variances even before checking for the equality of means - the only hypothesis we need is that the two populations (not samples) are normally distributed and there is no good reason to think otherwise (at least on a first pass). The test to use is the F-test: we can show that the quotient of the two sample variances is an F-variable.
Discrete Event Simulation - 10 The (sample) variance of the first population is s12 = 0.098, while that of the second population is s22 = 0.036. Their quotient s12 / s22 = 2.745. Also: And we cannot reject the hypothesis of equal variance (at the 95% level). One might suspect that the difference in the means of the two populations, coupled with the fact that the first ten samples were all from one population, while the second ten were from the second, might have had the effect of confusing any attempt at sequentially improving the confidence interval for the standard deviation. Or maybe not...
Discrete Event Simulation - 10 Initial Conditions. There is some vodoo here. The book states: “Choosing a reasonable set of starting conditions should not be difficult for an existing system.” This is a reasonable assumption if there are no delays in the system - i.e. if packets take no time to travel from one end of a wire to another; if switch processing time is 0; etc. A system with delays (and all systems have delays at some scale) must be properly “loaded”: queues must be established, groups of packets must be traveling along fibres, etc. This is not easy to set up, and the more complex the system modeled the more difficult this “preloading” of the simulator will be.
Discrete Event Simulation - 10 One of the current problems for the Internet is the determination of traffic characteristics: many teams have taken extensive histories of the traffic in their networks. The problem then becomes: how do we characterize that traffic in such a way that we can generate synthetic traffic with those characteristics. This will require the identification of a small set of parameters that characterize the traffic and all the system parameters that are affected by (or affect) the traffic.
Discrete Event Simulation - 10 The reason for searching for a small set should be clear: a small set should allow for a computationally cheaper traffic generation; it should also allow easier interpretation and hypothesis formation - both extremely important activities in the effective use of simulation tools. In the past, a reasonable technique was to collect data, plot the data, look at the data and attempt to find a probability distribution that had an acceptable fit. That still works: we have all the standard distributions - exponential, uniform, normal, Poisson, binomial, etc. Those have, in general, few parameters that have to be determined and easily remembered shapes.
Discrete Event Simulation - 10 The standard technique then remains: A) Collect Data; B) Guess a probability distribution function; C) Use a goodness of fit test to decide whether to accept it or try again. The “Internet Problem” is that traffic does not appear to follow any easily determinable “distribution” over, say, a period of time. There appears to be no clear periodicity, no trends, and, in general, traffic seems to be self-similar at all relevant time scales.
Discrete Event Simulation - 10 Internet Traffic appears to have characteristics common to “fractal” objects - so how does one characterize such fractal objects? One currently fashionable method involves computing a coefficient (or exponent), which attempts to measure the amount of self-similarity via a single number. Whether that is enough to characterize such traffic in a useful manner remains to be seen.
Discrete Event Simulation - 10 Another “best fit” example. Number of computer blockages per day.
Discrete Event Simulation - 10 Can we choose a “well fitting” probability density function? We know of two tests that we can run: the c2 test and the Kolmogorov-Smirnov one. Let’s try a binomial distribution. This is uniquely identified by two parameters: p - the probability of an occurrence; and n - the number of independent trials that make up an event. We know that: Solving: So n - necessarily an integer - could be either 6 or 7, with p = 0.580 or p = 0.497, respectively.
Discrete Event Simulation - 10 The c2 test: Cell Theoretical Observed N=6 N=7 N=6 N=7 0-1 5.253 6.633 6 0.1062 0.0604 2 16.171 17.201 16 0.0018 0.0836 3 29.77 28.325 30 0.0016 0.0991 4 30.838 28.047 31 0.0008 0.3109 5 16.985 16.542 14 0.5245 0.3906 6-7 3.976 6.252 6 1.0306 0.0102 Totals 103 103 103 1.673 0.956 There are 6 bins, and thus the number of degrees of freedom is 6 - 1 = 5. c2(5,0.95) = 11.1; c2(5,0.90) = 9.24, c2(5,0.1) = 1.61. There is no reasonable way of rejecting the null hypothesis (the chosen distribution fits) in either case.
Discrete Event Simulation - 10 The Kolmogorov-Smirnov test. The text (here) uses an “up to but not including” convention for the cumulative frequency distributions. Using the actual number of blockages as “up to and including” would stop at 7 starting at 0 rather than 1. Number of Blockages S(x) Observed F(x) N=7 |Diff| F(x) N=6 |Diff| 1 0.0000 0.0081 0.0081 0.0055 0.0055 2 0.0583 0.0644 0.0061 0.0510 0.0073 3 0.2135 0.2314 0.0178 0.2080 0.0056 4 0.5049 0.5064 0.0015 0.4971 0.0077 5 0.8058 0.7787 0.0276 0.7965 0.0093 6 0.9417 0.9393 0.0026 0.9614 0.0197 7 0.9903 0.9924 0.0021 1.0000 0.0097 8 1.0000 1.0000 0.0000 1.0000 0 Totals 0.0658 0.0648
Discrete Event Simulation - 10 The K&S test, at the 95% confidence level, gives a value for the sum of the differences: Neither value computed in the previous slide exceeds this bound: we cannot reject either of the two choices for n.
Discrete Event Simulation - 10 Run Lengths. If we want to evaluate some parameter to a given precision with a given probability, what can we do? If we just run the simulation we can check, for example, that the true population mean is within a certain confidence interval of the sample mean with a certain probability or not. If we succeed, did we waste resources in obtaining a larger than necessary sample? If we didn’t succeed, must we run a new, larger set of simulations to get a large enough sample? Can we determine what a large enough sample is? Caveat: we saw earlier that the sample variance might well increase with the size of the sample.
Discrete Event Simulation - 10 Suppose our simulator has been run J times and an output variable a has been measured, providing us with a set of values {a1, a2,…, aJ}. If this output variable is a random variable it will most likely have a cumulative frequency function F(a), an expected value E(a) and a variance s2(a) - also, most likely, unknown to us. The Central Limit Theorem says that the sample mean is normally distributed around the population mean with a variance which decreases in a manner inversely proportional to the size of the sample. This can be interpreted as saying that, given the size of a confidence interval and a probability relating the sample mean and the population mean.
Discrete Event Simulation - 10 Tchebycheff’s Inequality. Let X be a random variable with mean m= E(X) and variance s2 = Var(X). Then, for any t > 0, Since the new random variable is the sample mean, we have the formula: We don’t know the population variance, but we do know the sample variance - and so we use it.
Discrete Event Simulation - 10 The formula becomes: The idea is now quite simple: A) pick an initial value for D> 0 (the half-size of the confidence interval), one for e > 0 (the size of the acceptable probability of error), and one for J (the size of the initial run). Run J1 = J repetitions of the experiment and collect {a1, a2,…, aJ1}. B) If , you are done.
Discrete Event Simulation - 10 C) If not, find J2 such that And run the simulator for J2 - J1 iterations, obtaining now a total {a1, a2,…, aJ1, aJ1+1,…, aJ2} samples. Recompute the formula to get: If this satisfies the desired inequality, you are done, otherwise repeat the process to compute a J3, J4, etc.
Discrete Event Simulation - 10 Plotting the results from the earlier example tells us that stopping the first time the inequality is satisfied may be premature...
Discrete Event Simulation - 10 Variance Reduction. 1) Stratified Sampling. This technique arises from the attempt to force the occurrence of “rare” events, while taking into account actual probabilities of occurrence. The example in the text points out that a random set of simulation runs may leave substantial gaps in the coverage: it may turn out that random generation of synthetic data may fail to generate data in a number of regions of interest. Solution: make sure that all regions of interest are represented in the data. This ASSUMES that the distribution of the input data IS KNOWN.
Discrete Event Simulation - 10 How? Separate the range of input values into M contiguous non-overlapping regions. Each region Ri has a total probability pi, . We decide on how many runs to complete in each region - each a random run with data in the region - and then we combine the results. 1) How do we set up ni runs in each region? Region Ri will have a lower bound xi and an upper bound xi+1 (these could be multi-dimensional vectors: the changes should be obvious). We can generate a random number in the desired range by: r’i = F(xi)+(F(xi+1) - F(xi))ri, where ri is a uniform random number in the range [0,1), and F is the Cumulative Distribution Function (pi = F(xi+1) - F(xi)).
Discrete Event Simulation - 10 Apply the inverse function technique discussed in a previous lecture to compute the correct x-value. 2) How do we combine the results so that credit is apportioned correctly? Assume the total number of runs is N, with ni runs in each region, . Assume further that we are interested in estimating the mean of the population by estimating the sample mean. Let be the mean in each region. The sample mean will be given by . The value is clearly the same as the usual sample mean - the difference is that we have NOT taken a random sample at all: the only randomness is within the subpopulations.
Discrete Event Simulation - 10 Since we claim that this method will reduce the variance of the sample mean (and therefore be “more accurate”), let us follow through some algebra. The general variance of the sample mean is given by Since the region means are all independent, the variance of the stratified sample mean is given by Where si2 is the variance of the ith subpopulation. It should also be easy to see, if m denotes the population mean, and mi denotes the mean of the ith subpopulation, that:
Discrete Event Simulation - 10 Eliminating m2 in the second equation by squaring the first and replacing, we obtain: Dividing by N, we have and
Discrete Event Simulation - 10 Choosing representative sampling (ni = piN) provides us with a clear possibility for a variance reduction: the only extra catch is that the mean of each subpopulation must be different from that of the overall population. It should be also clear that representative sampling is not necessary: it suffices to make sampling choices that leave the sum of the last two terms positive. Let’s look at the textbook example.
Discrete Event Simulation - 10 Assume we have an exponential distribution with mean 1: , except that we don’t know it and we are trying to estimate its mean from 10 samples. Random Number ri Observation ri = -ln(1 - ri) Iteration 1 0.495 0.684 2 0.335 0.408 3 0.791 1.568 4 0.469 0.633 5 0.279 0.328 6 0.698 1.199 7 0.013 0.014 8 0.761 1.433 9 0.290 0.343 10 0.693 1.183 Mean 0.7793
Discrete Event Simulation - 10 So the sample mean is 0.7793, substantially less that 1. Its variance is 0.282 - and standard deviation 0.531. The variance of the sample means is thus (approximately) 0.282/10 = 0.0282. The problem is that the random number generator generates decently pseudo-random numbers and we have no guarantee that they will be “well-spread out”. No numbers were generated in the range 0.013-0.279, and no numbers were generated > 0.791. What can we do about it? We can run a lot of iterations of the simulator, hoping that randomness will eventually cover all regions; or we can take matters in our own hand: separate [0, 1] into strata and make sure that each stratum has values in it.
Discrete Event Simulation - 10 Stratum Portion Of Distribution Stratum Random Number Sample Size Sampling Weight Info 1 0 ≤ F(x) ≤ 0.64 r’ = 0 + 0.64r 4 p1 = 0.64, n1 = 4 2 0.64 < F(x) ≤ 0.96 r’ = 0.64 + 0.32r 4 p2 = 0.32, n2 = 4 3 0.96 < F(x) ≤ 1 r’ = 0.96 + 0.04r 2 p3 = 0.04, n3 = 4 This simply makes sure that we check the contents of the right tail.
Discrete Event Simulation - 10 Stratum i Random Number ri Stratum Random Number r’i Stratum Random Observation x’I = - ln(1 - r’i) Stratum Probability pi Stratum Events ni 1 1 0.495 0.317 0.381 0.64 4 2 0.335 0.215 0.242 3 0.791 0.507 0.707 4 0.469 0.300 0.357 2 5 0.279 0.729 1.306 0.32 4 6 0.698 0.864 1.995 7 0.013 0.644 1.033 8 0.761 0.884 2.154 3 9 0.290 0.9716 3.561 0.04 2 10 0.693 0.9877 4.398 Mean:
Discrete Event Simulation - 10 Notice that we have not taken the stratified mean as defined - we have made use of another formula that suggests the proper weights to give each of the subpopulation means. It uses the weights of representative sampling, coupled with stratum sample means. No use is made of the size of each of the stratum samples.
Discrete Event Simulation - 10 Recall that the stratified variance is given by Where si2 is the population variance of the ith stratum - which we can only approximate by the sample variance of the stratum. We compute: and from a previous slide. This gave a much better estimate of the mean and a tighter error, but .
Discrete Event Simulation - 10 If we use the probabilities pi instead, At this point, we have convergence of the simple variance reduction theory presented and the formulae introduced by the text only in the case where the number of samples in each stratum is exactly proportional to the probability of a sample from the stratum appearing in a random drawing. If not, it is not clear how the two numbers can be made to agree...
Discrete Event Simulation - 10 2) Antithetic Variates. This is based on the earlier observation that using negatively correlated sample values allows us to reduce the variance - a correlation coefficient r = -1 will give the maximum variance reduction. The example in the text reflects the earlier discussion about the generation of such variates.
Discrete Event Simulation - 10 3) Importance Sampling. This techniques appears to be a variant of stratified sampling, where the weights are chosen in a manner that reflects the relationship between the actual probability distribution of the population and the probability distribution implied by the choice of data points. The example given involves attempting to find the mean waiting queue time of a population of computer jobs, divided between compute bound jobs and I/O bound ones. It is also known that about 80% of the jobs are compute bound, while 20% are I/O bound. Variation in waiting time for I/O bound jobs is also very large.