130 likes | 291 Views
Output Data Analysis - Single System. We have a working simulator, generating data. What do we do with the data?
E N D
We have a working simulator, generating data. What do we do with the data? First Possibility: set it up with "typical input conditions" and "typical parameters" - all of which we more or less know how to do (by now) - and let it go. At the end of the run, if the results look good, recommend the system (or system changes); if the results look bad, send the system designers back to the drawing board. This is where some research papers stop... Second Possibility: you remember that just about all games of chance favor the house. If we all really believed that as a deterministic statement, nobody would gamble - yet, many of us gamble because we believe that, somehow, the odds favor at least an occasional "turn of luck", and that we might be there to benefit. This is another way of saying that a single run means absolutely nothing...
We will have to carry out a number of runs, and we will have to be careful about not making too many unwarranted assumptions about independence. Let Y1, Y2, …, Ym be the variables obtained from a single run of length m. These could be the number of cars arrived at a gasoline station during the j-th hour, 1 ≤ j ≤ m; the number of people passed through customs during the j-th minute; the numebr of breadloaves baked during the j-th hour; the Geiger counter clicks during the j-th second, etc… One of the problems with these variables is that they are neither independent nor identically distributed - we should expect them to be highly correlated (a busy hour is usually followed or preceded by a busy hour; a high reading on the Geiger counter is usually followed and preceded by a high reading; etc.). This means that any of our previous results on confidence intervals for the mean (or any other statistic) cannot be applied because they required independence.
We need some methods that will give us IID samples: run the simulation n times, using different random seeds of the input variable generators, but with the same initial conditions. We obtain a table of values: y11, y12, …, y1i, …, y1m y21, y22, …, y2i, …, y2m ……………………………………………………… yn1, yn2, …, yni, …, ynm where uji will denote the i-th random number in the j-th run. Although the problems with the row values are the same as before, the i-th column values should be IID observations of the random variable Yi. We will use this idea of independence across runs to derive some data analysis methods: for example is an unbiased estimate of E(Yi).
One of the first problems we have to deal with is that of determining the difference between transient behavior and steady-state (time-independent) behavior. Since the system starts "empty", it will take some time before it reaches any kind of "normal" (read: steady-state) behavior. During this period of time our results will be, in some sense at least, "unrepresentative". Let I denote the initial conditions of the system.. For the output stochastic process, let Fi(y|I) = P(Yi ≤ y|I), for i = 1, 2, …, the probability that the event {Yi ≤ y} occurs given the initial condition I. Fi(y|I) is called the transient distribution of the output process at time i for the initial conditions I. Fi(y|I) will, in general, be different for each i and for each set of initial conditions I. The next slide shows some possible distributions for several values of i.
If Fi(y|I) -> F(y) as i -> ∞, for all i and any initial conditions I, the function F(y) is called the steady-state distribution of the output process Y1, Y2, … This steady-state distribution is attained only in the limit, but we can usually identify an integer k such that the distribution functions of Yk, Yk+1, … are (and remain) sufficiently close so that, for our purposes, they can be replaced by a single distribution, the limit one. Furthermore the random variables will not be independent but will approximately constitute a covariance-stationary stochastic process. Example:
From the examples (and theoretical considerations) the steady-state distribution does not depend on the initial conditions, although the rate of convergence of the distributions of the Yi's does.