1 / 32

Comparing Distributions III: Chi squared test, ANOVA By Peter Woolf (pwoolf@umich)

Comparing Distributions III: Chi squared test, ANOVA By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls Open Textbook version 1.0. Creative commons. Unit 1. Unit 2.

fritzi
Download Presentation

Comparing Distributions III: Chi squared test, ANOVA By Peter Woolf (pwoolf@umich)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparing Distributions III: Chi squared test, ANOVA By Peter Woolf (pwoolf@umich.edu) University of MichiganMichigan Chemical Process Dynamics and Controls Open Textbookversion 1.0 Creative commons

  2. Unit 1 Unit 2 Scenario: You have two parallel processes that carry out the same reaction using very similar equipment. Question: Are these units actually behaving the same or not?

  3. Approach: (1) Gather data on yield from both units Plot of data does not clearly show any difference

  4. Requires binning Directly on data Approach: (1) Gather data on yield from both units (2) Perform statistical analysis • Fisher’s exact test • Chi squared test • ANOVA

  5. HIGH LOW Binning Data: Data reduction by reassigning data into windows

  6. HIGH LOW Binning Data: Data reduction by reassigning data into windows Choosing a binning strategy: • Assign to bins that naturally appear such as groupings or important thresholds (e.g. yield>50 is profitable, so this is a natural window) • If multiple windows appear, assign multiple bins • If no natural bins appear, choose equally sized bins or above/below average Bin in excel with IF.. THEN statements

  7. HIGH As mentioned in last lecture, we can use Fisher’s exact to calculate a p-value of the probability of finding this configuration at random LOW For Fisher’s exact and Chi squared tests,create a contingency table. Contingency table Low High 97 150 Unit 1 53 82 68 150 Unit 2 135 165 300

  8. Low High observed 97 150 Unit 1 53 82 68 150 Unit 2 135 165 300 Low High “more extreme” configuration 98 150 Unit 1 52 83 67 150 Unit 2 135 165 300 “most extreme” configuration Low High 150 150 Unit 1 0 135 15 150 Unit 2 135 165 300

  9. Most likely cases if this were a random sample Observed case More extreme =0.0005 Less extreme =0.9995 Total area=1.0 Conclusion: • The units are behaving differently IDEA! The distance between observed case and the most likely if random is far, so can we just use that? Probability of configuration # changes away from observed

  10. IDEA! The distance between observed case and the most likely if random is far, so can we just use that? If this distance is “big” then the observed case is unusual What is this point? Probability of configuration # changes away from observed

  11. Observed case Low High 97 150 Unit 1 53 82 68 150 Unit 2 135 165 300 Distance between these two cases? Low High 150 Unit 1 150 Unit 2 135 165 300 Chi squared statistic What is this point? Most likely case if random =150*(135/300) =67.5 =150*(165/300) =82.5 But this depends on the magnitude, so normalize it.. =150*(135/300) =67.5 =150*(165/300) =82.5

  12. Low High 97 150 Unit 1 53 82 68 150 Unit 2 135 165 300 For this case: Low High 150 Unit 1 150 Unit 2 135 165 300 Chi squared statistic Observed case Most likely case if random Okay.. So what? What is the p-value? =150*(135/300) =67.5 =150*(165/300) =82.5 =150*(135/300) =67.5 =150*(165/300) =82.5

  13. For this case: This can be done in a more automated way in excel using “chitest” Chi squared statistic The chi squared statistic has a known distribution that can be looked up or found in excel using “chidist” with 1 degree of freedom. =chidist(11.33,1)=0.00076 For this case chitest & Fisher’s exact agree

  14. Chi squared test vs. Fisher’s exact • For a random null, Fisher’s exact will always yield a correct result • Chi squared test is often easier to carry out (the math is easier) • Chi squared will give incorrect results when • fewer than 20 samples are present • if there are between 20 and 40 samples and one expected number is 5 or below Chitest says the result is 2x more significant--error due to small sample effect

  15. Chi squared test vs. Fisher’s exact (continued) • Chi squared test is easy to do for larger contingency tables and when the expected distribution is not random. • Can be done with a Fisher’s like test, but the math gets much harder. Example: 3 by 3 contingency table with a model for expectations Observed is close to the expected, but far from random

  16. Approach: (1) Gather data on yield from both units (2) Perform statistical analysis • Fisher’s exact test • Chi squared test • ANOVA Requires binning Directly on data

  17. ANOVA: Analysis of Variance Method to compare continuous measurements determine if they are sampled from the same or different distributions. For a single factor ANOVA, we assume that each observation in each class can be modeled as: Observation = overall mean + class effect + random error In the study we are following in this class, the class effect would be the effect unit 1 or unit 2. ANOVA analysis can be easily done in Excel using Tools->Data Analysis-> ANOVA

  18. 1 way ANOVA Key value: p-value here tells the probability that both units (each group) are the same.

  19. 2 way ANOVA with replicates Scenario: Testing three units in triplicate, each with three different control architectures: Feedback (FB), Model predictive control (MPC), and a cascade architecture. In each case we measure the yield. Questions: Do the units significantly differ? Do the control architectures significantly differ? Tools->Data Analysis ->ANOVA:Two factor with replication

  20. 2 way ANOVA with replicates Controllers (samples) have a significant effect Columns (units) don’t have a significant effect ?? Looks like an error, and may be why we get a negative F value and no p-value

  21. ANOVA • ANOVA tells you if factors are significantly related to an outcome according to a linear model • Nonlinear relationships can be strong, but may appear insignificant in an ANOVA analysis. • ANOVA does not tell you the model parameters. • ANOVA, t-test, and z-test all provide similar kinds of information for different kinds of data.

  22. Statistical Analysis Physical process Experimental Data • Results: • Unit 1 is different from unit 2 • This difference is clearer in the binned data (chi squared and fisher’s<ANOVA) Unit 1 Unit 2

  23. Take Home Messages • Chi squared tests are analogous to Fisher’s exact tests, but are generally easier to calculate • Chi squared tests fail when sample sizes are small • ANOVA determines if lists of continuous measurements likely the same or different • ANOVA can determine the significance of a set of factors on the measurements

  24. The following pages have additional examples of ChemE applications of ANOVA analyses

  25. Solution approach: two factor ANOVA. Factor 1: Farm Factor 2: Shipper See if a factor has a significant p-value

  26. Looking at averages and ranges, it looks like shipper Rex has a somewhat worse record than Ned. The farms have some variation, but it is small. This said, both shippers will bring wheat with moths, but Rex will bring more.

  27. 1) Import data into Excel 2) Select Tools->Data Analysis-> ANOVA: Two factor with Replication Conclusion, the factor “shipper” has a significant Influence on the moth probability with a p-value of 0.03

  28. ANOVA- ChemE examples How does temperature affect yield?

  29. ANOVA- ChemE examples Do both temperature and concentration affect yield?

  30. ANOVA- ChemE examples How can controlling v4 and v2 differently affect process profitability? Example from 2006 controls wiki: http://controls.engin.umich.edu/wiki/index.php/Design_of_experiments_via_taguchi_methods:_one_and_two_way_layouts

  31. DATA How can controlling v4 and v2 differently affect process profitability? Example from 2006 controls wiki: http://controls.engin.umich.edu/wiki/index.php/Design_of_experiments_via_taguchi_methods:_one_and_two_way_layouts

  32. DATA ANOVA How can controlling v4 and v2 differently affect process profitability? Example from 2006 controls wiki: http://controls.engin.umich.edu/wiki/index.php/Design_of_experiments_via_taguchi_methods:_one_and_two_way_layouts

More Related