140 likes | 458 Views
COMP155 Computer Simulation October 1, 2008. Lecture 10: Probability and Statistics (part 2). This Week. Review of probability and statistics needed to understand simulation follow Appendix C in Arena text Outline Monday (C.1, C.2): Probability – basic ideas, terminology
E N D
COMP155Computer Simulation October 1, 2008 Lecture 10:Probability and Statistics (part 2)
This Week • Review of probability and statistics needed to understand simulation • follow Appendix C in Arena text • Outline • Monday (C.1, C.2): • Probability – basic ideas, terminology • Random variables, joint distributions • Today (C.3-C.5): • Sampling • Statistical inference – point estimation, confidence intervals, hypothesis testing
Sampling • Statistical analysis: purpose is to estimate or infer something about a large population • population is a set of data points • population is too large to look at completely,so we only look at a sample from the population • if the sample is randomly selected from the population, the distribution of the sample should be the same as the distribution of the population • in practice: determine a PMF or PDF for a sampleand assume that distribution holds for the entire population
Sampling • Random sampleis a set of independent and identically distributed (IID) observationsX1, X2, …, Xnfrom the population • Input modeling: • observations come from the real world • Arena’s input analyzer can be used to determine distribution function • Output analysis: • observations are the results of multiple runs/replications of the simulation • Arena’s output analyzer can be used to characterize the output population from the observations.
Sampling: Simulation Output • Random sampleis a set of independent and identically distributed (IID) observationsX1, X2, …, Xnfrom the population • Input modeling: • observations come from the real world • Arena’s input analyzer can be used to determine distribution function • Output analysis: • observations are the results of multiple runs/replications of the simulation • Arena’s output analyzer can be used to characterize the output population from the observations.
Estimating Distribution from Samples • Samples: X1, X2, …, Xnassuming a normal distribution, compute: • sample mean • sample variance • These statistics have their own sampling distribution, which is generally normal
Sampling Distributions • If • If underlying distribution of X is normal, then the distribution of is also normal.
Point Estimation • Point estimates are estimates of population distribution parameters (m, s2, …) • Properties of point estimates • Unbiased: E(estimate) = parameter • Efficient: Var(estimate) is lowest among competing point estimators • Consistent: Var(estimate) decreases (usually to 0) as the sample size increases
Confidence Intervals • A confidence interval quantifies the likely imprecision in a point estimator • An interval that contains (covers) the unknown population parameter some specified probability • Called a 100 (1 – a)% confidence interval for the parameter • Example: 87 < m < 123 with probability 95% • The value of m is in (87, 123) with 95% confidence • We’ll leave the computation of confidence intervals to a statistics course … or to Arena’s output analyzer tool.
Confidence Intervals in Simulation • Run simulation replications, get results • View each replication of the simulation as a data point • Form a confidence interval • The confidence interval tells you how close you are to getting the “true” expected output (what you’d get by averaging an infinite number of replications)
Hypothesis Tests • A hypothesis test is used to test some assertion about the population or its parameters • With sampling, we don’t get true/false result, only get evidence that points one way or another • Null hypothesis(H0) – what is to be tested • Alternate hypothesis(H1 or HA) – denial of H0 H0: m = 6 vs. H1: m 6 H0: s < 10 vs. H1: s 10 H0: m1 = m2 vs. H1: m1m2 • Develop a decision rule to decide on H0 or H1 based on sample data
Errors in Hypothesis Testing 1-α is the probability of your confidence interval
Hypothesis Testing in Simulation • Input side • Specify input distributions to drive the simulation • Collect real-world data on corresponding processes • “Fit” a probability distribution to the observed real-world data • Test H0: the data are well represented by the selected distribution • Output side • Have two or more “competing” designs modeled • Test H0: all designs perform the same on output, or test H0: one design is better than another • Selection of a “best” model scenario