1 / 27

INDE 2333 ENGINEERING STATISTICS I GOODNESS OF FIT

INDE 2333 ENGINEERING STATISTICS I GOODNESS OF FIT. University of Houston Dept. of Industrial Engineering Houston, TX 77204-4812 (713) 743-4195. AGENDA. Chi-square goodness of fit test. GOODNESS OF FIT TESTS.

suki
Download Presentation

INDE 2333 ENGINEERING STATISTICS I GOODNESS OF FIT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INDE 2333 ENGINEERING STATISTICS I GOODNESS OF FIT University of Houston Dept. of Industrial Engineering Houston, TX 77204-4812 (713) 743-4195

  2. AGENDA • Chi-square goodness of fit test

  3. GOODNESS OF FIT TESTS • Used to determine if a sample could have come from a distribution with the specified parameters • Commonly used to determine if data is normally distributed • Many tests such as the ones that we have been using require normally distributed data. • If data is not normally distributed, non-parametric tests must be used (next subject in the course) • Also used for input distributions in system modeling • Customers or jobs arrive exponentially distributed? • Service times follow what distribution? • Failures occur according to what distribution?

  4. GOODNESS OF FIT TESTS • Based on a comparison of observations between • Observed data • Theoretical data • The comparison utilizes a set of intervals or cells • Each cell has a lower and upper boundary values • The determination of the boundaries are a function of • Theoretical distribution • Number of observations in the sample • 2 different approaches…

  5. TWO DIFFERENT APPROACHES • Approach 1 • Used in the book • Equal interval approach • No cell grouping can have less than 5 expected observations • Approach 2 • Used in other books • Equiprobable approach • Maximum number of cells not to exceed 100 such that the expected number of observations is at least 5 = Int ( obs/5 ) • Expected number of obs in each cell = obs / cells • More statistically robust

  6. HYPOTHESES TEST PROCEDURE • Identify Ho and Ha • Determine level of significance (generally 0.05 or 0.01) • Determine “critical value” criterion from level of significance • Calculate “test statistic” • Make decision • Fail to reject Ho • Reject Ho

  7. HYPOTHESES • Ho • The sample could have come from a distribution with the specified parameters • Ha • The sample could not have come from a distribution with the specified parameters

  8. CRITICAL VALUE • Chi-square distribution chart • One sided test • Alpha typically 0.05 • Degrees of freedom • # of cells - # of parameters used from sample -1 • The -1 is always used due to the known sample size n • Note, if the parameters are specified not sampled then they do not reduce the number of degrees of freedom in the above equation

  9. CHI-SQUAREfor a particular number of degrees of freedom f(X^2) Right tail probability, alpha, typically 0.05 0 X^2 X^2 Critical value

  10. TEST STATISTIC

  11. DECISION • Cannot reject • Test statistic is less than the critical value • Sample could have come from a distribution with the specified parameters • Reject • Test statistic is greater than the critical value • Sample could not have come from a distribution with the specified parameters

  12. EXAMPLE 1EQUAL INTERVAL APPROACH • 400 5 minute intervals were observed for air traffic control messages • At alpha=0.01, is the distribution of the number of messages able to be considered as having a poisson distribution with a mean of 4.6? • Approach • Lamba parameter of 4.6 is given • Use the poisson table probability table for 4.6 • Multiply the probability by 400 to obtain the expected observations • Compare the actual observations to the expected observations

  13. HYPOTHESES • Ho: • Poisson distribution with mean of 4.6 • Ha: • Not poisson distribution with a mean of 4.6

  14. CHI-SQUAREfor 10-1 degrees of freedom f(X^2) Right tail probability, alpha = 0.01 0 X^2 16.919 Critical value

  15. TEST STATISTIC

  16. DECISION • Test statistic of 6.749 is less than the critical value of 16.919 • Cannot reject Ho of distribution being poisson with a mean of 4.6 • There is evidence to support the claim that the data came from a poisson distribution with a mean of 4.6 at an alpha level of 0.01

  17. EXAMPLE 2EQUIPROBABLE APPROACH • Were the scores from an INDE 2333 exam normally distributed? • Sample statistics • Mean=71.95 • Std=11.93 • N=43

  18. HYPOTHESES • Ho • The sample could have come from a normally distributed population with a mean of 71.95 and a std of 11.93 • Ha • The sample could not have come from a normally distributed population with a mean of 71.95 and a std of 11.93

  19. CRITICAL VALUE • Chi-square distribution chart • One sided test • 0.05 • Degrees of freedom • The sample size is 43 • Want the maximum number of cells not to exceed 100 with a minimum expected number of observation of 5 • 43/5=8.6 cells • With 8 cells, the expected number of observations is 5.375 • Degrees of freedom is number of cells – number of parameters used from sample-1 • Degrees of freedom=8-2-1=5

  20. CHI-SQUAREfor 5 of degrees of freedom f(X^2) 0.05 0 X^2 11.070

  21. TEST STATISTIC

  22. CELL BOUNDARIES • To calculate observed values in each cell, we must determine the actual x cell boundaries from the 8 equiprobable cells • For normal distributions • Look up z value corresponding to probability • Boundaries =mean+std * Z

  23. CALCULATING OBSERVATIONS

  24. CALCULATING TEST STATISTIC

  25. DECISION • 2.581 < 11.070 • Cannot reject the Ho • Evidence to support the claim that the test scores are normally distributed with a mean of 71.95 and std of 11.93

  26. IN EXCEL • Frequency • Data_array, bins_array • Range operation • CTRL-SHIFT-ENTER • Norminv function • Probability, mean, std • Chiinv function • Probability, df

More Related