810 likes | 1.12k Views
AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS. Sample Statistics. If x 1 , x 2 , …, x n are n observations of the value of an unknown quantity X , they constitute a sample of size n for the population on which X is defined. Sample mean Sample variance.
E N D
AN INTRODUCTION TO STATISTICAL ANALYSISOF SIMULATION OUTPUTS
Sample Statistics • If x1, x2, …, xn are n observations of the value of an unknown quantity X, they constitute a sample of size n for the population on which X is defined. • Sample mean • Sample variance
Central Limit Theorem • If the n mutually independent random variables x1, x2, …, xn have the same distribution, and if their mean m and their variance s2 exist then …
Central Limit Theorem • The random variable is distributed according to the standard normal distribution (zero mean and unit variance).
Estimating a mean • Assume that we have a sample x1, x2, …, xn consisting of n independent * observations of a given population • The sample mean xbar is an unbiased estimator of the mean m of the population *This is the critical assumptionWithout it, we cannot apply the formula
Large populations • For large values of n, the (1-)% confidence interval for m is given by • withfor the standard normal distribution
Explanations • 1 – a expressed in percent is the level of the confidence interval • 90% means a = 0.10 • 95% means a = 0.05 • 99% means a = 0.01 • a is the error probability • Probability that the true mean falls outside the confidence interval
95% Confidence Intervals • For =.05, za/2=1.96 • Example: • If = 35, s= 4 and n = 100 • The 95% confidence interval for m is 35 ± 1.96x4/10 = 35 ± 7.84
When s is not known • We can replace s in the preceding formula by the standard-deviation s of the sample • When n < 30, we must read the value of za/2 from a table of Student's t-distribution with n - 1 degrees of freedom • When n 30, we can use the standard values
Confidence Intervals in CSIM • CSIM can automatically compute confidence intervals for the mean value of any table, qtable and so on. • For everything but boxes • xyx->confidence(); • For the elapsed times in a box • bd->time_confidence(); • For the population of a box • bd->_number_confidence(); • We get 90, 95 and 99 percent confidence intervals
Confidence Intervals in CSIM • We get 90, 95 and 98 percent confidence intervals • Computed using batch means method • See next section • But only for the mean
Estimating a proportion • A proportion p represents the probabilityP(X) for some fixed threshold • 97% of our customers have to wait less than one minute • Confidence intervals for proportions are much easier to compute than confidence intervals for quantiles
Assumptions • Assume that we have n independent observations x1, x2, …, xn of a given population variable X and that this variable has a continuous distribution • Let p represent the proportion we want to estimate, say P(X) • Let k represent the number of observations that are .
Basic property • If p represent the proportion we want to estimate, say P(X), and k represent the number of observations that are • The rv k is distributed according to a binomial distribution with mean np and variance p(1 – p) • The rv k/n is distributed according to a binomial distribution with mean p and variance p(1 – p)/n
The formula • When n > 29.the distribution of k/n is approximately normal • We then have
The problem • All statistical analysis techniques we have discussed assume that sample values are mutually independent • Generally false for quantities such as • Waiting times, response times, … • Tend instead to be autocorrelated • When the waiting lines are long, everybody wait a long time
Traditional solution • Keep the measurements sufficiently apart • Sample them every T minutes apart • Not practical • Would require very long simulations
Three good solutions • Batch means • Regenerative method • Time series analysis
Batch means • We group consecutive observations into “batches” • We compute the means of these batches • We observe that autocorrelation among batch means decreases with size of batches • When size increases, each batch includes more observations that are far apart
Example • We collected the following values: • 4, 3, 3, 4, 5, 5, 3, 2, 2, 3 • We group them into two batches of five observations: • 4, 3, 3, 4, 5 and 5, 3, 2, 2, 3 • The batch means are: • 3.8 and 3
Batch means in CSIM • CSIM uses fixed-size batches • To compute confidence intervals • To control the duration of a simulation(run-length control)
Regenerative method • Most systems with queues go through states that return it to an state identical to its original state • The system regenerates itself • Examples: • Whenever a disk array is brought back to its original state • Whenever a camper rental agency has all its campers available
Key idea • Define a regeneration interval as an interval between two consecutive regeneration points: • Observations collected during the same regeneration interval can be correlated • Observations collected during different regeneration intervals are independent
Application • We group together observations that occur within the same regeneration interval • We compute the means of these groups of observations • These group means are independent from each other
Limitations of the approach • Not general: • System must go through regeneration points • System must be idle • Leads to complex computations • We rarely have exactly the same number of observations in two different regenerations intervals
Time series analysis • Treats consecutive observations as elements of a time series • Estimates autocorrelation among the elements of a time series • Includes this autocorrelation in the computation of all confidence intervals
Objective • Accuracy of confidence intervals increases with duration of simulation • The 1/n factor • We would like to be able to stop the simulation once a given accuracy level has been reached for the confidence interval of a specific measurement
The CSIM solution (I) • Specify • Quantity of interest • Mean value from a table, a qtable, … • A relative accuracy • Maximum relative error • A confidence level (say, 0.90 to 0.99) • A maximum simulation duration • In seconds of CPU time
The CSIM solution (II) • void table::run_length(double accuracy,double conf_level, double max_time) • void qtable::run_length(double accuracy,double conf_level, double max_time) • void meter::run_length(double accuracy,double conf_level, double max_time) • void box::time_run_length(double accuracy,double conf_level, double max_time) • void box::number_run_length(double accuracy,double conf_level, double max_time)
The CSIM solution (III) • Example: • thistable->run_length(.01, .95, 500) • Specifies than we want an error less than 1 percent for 95% confidence interval of mean of thistable • Stops simulation after 500 seconds • thisbox->:time_run_length(.01, .99, 500) • Same for time average of a box
The CSIM solution (IV) • Replace termination test by • converged.wait();
Example (I) • The campers • We want • A maximum error of 1 percent (0.01) • For the 95 percent confidence interval • Of average number of rented campers • A maximum simulation time of 100 s • Will use number aspect of agency box
Example (II) • Add • agency->number_confidence() after activation of box agency in csim process
Example (III) • Put main loop • while(simtime() < DURATION) { hold(exponential(MIART); customer();} in a separate arrivals process • Make it an infinite loop
Example (IV) • Add • converged.wait(); after call to arrivals process • arrivals():converged.wait(); • Best way to let sim process generate customers and wait for terminationin parallel
Example (V) • The new arrivals process void arrivals() { process(“arrivals”); // REQUIRED for(;;) { // forever loop hold(exponential(MIART)); customer(); } // forever} // arrivals
Warnings • Confidence intervals do not take into account model inaccuracies • While the batch means method eliminates most effects of measurement autocorrelation, it is not always 100% effective • The max_time parameter of the run_length() will not necessarily stop the simulation just after the specified CPU time • Like the emergency brake of a train
Objective • Partition processes into different classes • Low priority • High priority • Obtain separate statistics for each process priority class
Declaring and initializing process classes • To declare a dynamic process class: • process_class *c; • To initialize a process class before it can be used in any other statement. • c = new process_class("low priority")
To assign a priority class • Add inside the process • c->set_process_class(); • Processes that have not been assigned a process class belong to the “default” process class
Reporting • Can use • report() • report_classes()
Other options • Can change the name of a process class: • c->set_name("high priority"); • Can reset statistics associated with a process • c ->reset(); • Can do the same for all process classes: • reset_process_classes(); • Can delete a dynamic process class: • delete c;
Background • We need to distinguish between • Truly random numbers • Obtained through observations of a physical random process • Rolling dices, roulette • Atomic decay • Pseudo-random numbers • Obtained through arithmetic operations
Example • Linear Congruential Generators (LCG) • Easy to implement and fast • Defined by the recurrence relation: rn+1 = (a rn + c ) mod m where • r1, r2, … are the random values • m is the "modulus“ • r0 is the seed
Two realizations • GCC family of compilers • m = 232a = 69069 c = 5 • Microsoft Visual/Quick C/C++ • m = 232a = 214013 c = 2531011
Problems with pseudorandom number generators • Much shorter periods for some seed states • Lack of uniformity of distribution • Correlation of successive values • …
Better RNGs • Use the Mersenne twister • Period is 219937 - 1 • Blum-Blum-Schub