1 / 36

02_AnovaReviewedited

Basic of anova

Charlton1
Download Presentation

02_AnovaReviewedited

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Variance –One-way ANOVA

  2. Module Objectives ? Review the concepts of Analysis of Variance ? Sum of Squares ? Mean Square Error ? Demonstrate and practice calculating the ANOVA table ? Manually ? Minitab ? Practice ANOVA ? Exercises ? Homework

  3. DMAIC Roadm ap for Breakthrough Measure Analyze Improve Control Define Project Scope & Problem Validation Failure Modes & Effects Analysis Design & Execute An Experiment Optimize & Refine Solutions Refine the Project Process Maps & Simplification Problem Statement Control X’s & Monitor Y’s ID Variation: Graphical Analysis Define Y=f(x) C&E for Variable Reduction Project Metrics Recommend Changes Close & Hand- Off Project ID Variation: Statistical Analysis Measurement Capability Objective Statement(s) Data Collection Systems Plan for DOE Team Members Process Capability

  4. Mathem atical Tools & Data Types Response Variable Discrete Continuous Independent Variable Discrete Contingency Tables Analysis of Variance Continuous Linear & Multiple Regression Logistic Regression A menu of Six Sigma tools.

  5. Single X, Single Y Hypothesis Tests Data Description Hypothesis Test Tool examples ?Printer Model (X1) vs. Service Level (Y) 1 Discrete X Continuous Y Ho: u = Target 1-Sample t test Time series charts; histogram; Capability analysis ?Printer Models (X1, X2) vs. Fuser Life (Y) 2 levels Discrete X Continuous Y Ho: u1= u2 2-sample t test; 2-sample paired t-test Scatter plots; Dot Plots ?Monday-Friday (X1, X2, X3, X4, X5) vs. Sales (Y) 3+ levels Discrete X Continuous Y Ho: u1=…=uk One-way ANOVA Box Plots; Main Effects plots; Interaction plots; Pareto charts ? Printer cartridge defect rate (p) vs. target 1 Discrete X Continuous Y Ho: p = po 1 Proportion p-value ? Comparison of quality levels of two products 2 levels Discrete X Discrete Y Ho: p1– p2= po 2 Proportions Difference, confidence interval, p-value ?Observed Events vs. Expected Events 3+ levels Discrete X Discrete Y Ho: p1=…=pk Chi-Square; Analysis of Means Single Regression Contingency table ?Temperature vs. Pressure Continuous X Continuous Y Ho: slope=0 Fitted Line Plot; Fits and Residuals

  6. Testing Means μtarget Testing for one mean ? Z-test for large samples or when σ is known ? t-test for smaller samples or when σ is unknown ? Ho: μ = μtarget μ ≠ μtarget Ha: μ < μtarget μ > μtarget μ Testing for two means ? 2-sample t-test ? Paired t-test ? Ho: μ1= μ2 μ1≠ μ2 Ha: μ1< μ2 μ1> μ2 Testing for three or more means ? 1-way ANOVA ? 1 μ μ 2 Ho: μ1= μ2 = μ3 = … Ha: @ least one μ is different from the others μ 3 μ 1 μ μ 4 2

  7. ANOVA Model ? Mathematical Model for ANOVA y μ + = τ + ε ij j ij Where: yij μ τj εij = a single response from Treatment j = overall mean = the contribution from Treatment j = random error Mathematical Hypothesis Conventional Translation μ = μ = = μ s ' τ = H : ... 0 H : 0 1 2 j 0 τ ≠ different is j μ 0 H : @ least one H : least @ one a j a ANOVA is a model for discrete inputs.

  8. Solution –Analysis of Variance ? ANOVA is really an extension (generalization) of the 2-sample t-test. ? ANOVA is a method of detecting difference between multiple means of samples. ? Why is it called Analysis of Variance? ? ANOVA is the mathematics behind the intuitive evaluation. ? ANOVA compares/analyzes variances. ? Variance within a group ? Variance between groups

  9. Analysis of Variance –General Recipe 1. State the practical problem State the null hypothesis State the alternate hypothesis Do the model assumptions hold? Construct the Analysis of Variance Table 6. Do the assumptions for the errors hold (residual analysis)? 7. Interpret the p-value (or the F-statistic) for the factor effect (p < α) 8. Calculate %SS for the factor and error terms 9. Translate the conclusion into practical terms 2. 3. 4. 5.

  10. An ANOVA Calculation Exam ple Server_Speed.mtw ? Are 3 email servers performing the same? ? The performance of three email servers was tested by measuring the time (in ms) each took to send a 125 KB message. ? Transmission times for 6 emails were measured for each server. ? Use ANOVA and the following data to detect differences among the different servers. Use α = 0.05 Server A 66 67 74 73 75 64 Server B 85 85 76 82 79 86 Server C 91 93 88 87 90 86

  11. Graphical Exploration –Minitab Boxplot of Time vs Server 95 90 85 80 Time 75 Individual Value Plot of Time vs Server 95 70 90 65 85 60 A B C 80 Server Time 75 70 65 60 A B C Server What conclusions can you draw?

  12. ANOVA Assum ptions ? Each sample is an independent, random sample. ? Independent ? The selection of any sample is not dependent on any other sample being selected or not selected. ? Random ? All members of the population have an equal chance of being selected. ? The measurements within each group are normally distributed and have equal variances. ? This only applies for the within group variation, not between group variation. ? The variances for each group (treatment) are equal.

  13. Norm ality Testing for ANOVA Probability Plot of Time_ A Normal 99 Mean StDev N AD P-Value 69.83 4.708 6 95 0.407 90 0.230 80 70 60 50 40 30 Percent 20 10 5 1 60 65 70 75 80 Time_ A Probability Plot of Time_ C Normal Probability Plot of Time_ B Normal 99 99 Mean StDev N AD P-Value 89.17 2.639 Mean StDev N AD P-Value 82.17 3.971 6 95 6 95 0.178 0.364 90 90 0.859 0.304 80 80 70 60 50 40 30 70 60 50 40 30 Percent Percent 20 20 10 10 5 5 1 1 82 84 86 88 90 92 94 96 75 80 85 90 Time_ C Time_ B

  14. A Variation Estim ate –Sum of Squares ? The variability of n sample measurements about their mean can be measured using the sum of squared deviations from the grand mean: ? The variability of each group as compared to the grand mean is: 2 ∑∑ i = − ( ) SS y y Between j j ∑ j = − 2 2 ( ) n y y ∑∑ i = − ( ) SS y y j j Total ij j ? According to the model: ? Likewise, the sum of square deviations within each group is: = + SS SS SS Total Between Within ∑∑ i 2 = − ( ) SS y y Within ij j j

  15. The Sum of Squares Model ? Putting it all together: ( ∑∑ = = j i 1 1 ) ) ( ( ) 2 k m k k m 2 ∑ = j ∑∑ = j 1 2 − = − + − y y n y y y y ij j ij j j = 1 1 i SS SS SS Within Total Between 100 X X X 90 80 70 60 0 5 10 15 20 25 30

  16. The ANOVA table: Degrees of Freedom Degrees of Freedom Sum of Squares Mean Square F-Statistic Source Between (or Factor) Within (or Error) Totals s2 s2 Between/s2 k-1 SSBetween Between = SSBetween/k-1 Within s2 n-k SSWithin Within = SSWithin/n-k n-1 SSTotal ? k is the number of groups (or Levels) ? d.f. (between) = 2 ? n is the total number of samples ? d.f. (within) = ? d.f. (total) = 15 17

  17. The ANOVA table: Sum of Squares Degrees of Freedom Sum of Squares Mean Square F-Statistic Source Between (or Factor) Within (or Error) Totals s2 s2 Between/s2 2 SSBetween Between = SSBetween/k-1 Within s2 15 SSWithin Within = SSWithin/n-k 17 SSTotal ∑ j ∑∑ i 2 = − = − 2) ( ( ) SS n y y SS y y Between j Within ij j j j 2 ∑∑ i = − ( ) SS y y Total ij j

  18. The ANOVA table: SSBetween C 91 93 88 87 90 86 535 89.2 80.4 8.78 77 462 A 66 67 74 73 75 64 419 69.8 80.4 -10.6 111 669 1150 1150 B 85 85 76 82 79 86 493 82.2 80.4 1.78 3.16 19 ? Fill in the blanks to calculate the Sum of Squares (Between) ∑ j = − 2) ( SS n y y Between j j Column Sum Average (Sum/n ) Grand Average Average - Grand Average Square the Difference Multiply by nj (Sample Size) j Sum SSBetween

  19. The ANOVA table: SSTotal ? Fill in the blanks to calculate the Sum of Squares Total 2 ∑∑ i = − ( ) SS y y Total ij j Factor A Data 66 67 74 73 75 64 85 85 76 82 79 86 91 93 88 87 90 86 Grand Average Data minus Grand Avg. Square the Difference A A A A A B B B B B B C C C C C C 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 Sum Row SS Total -14 -13 -6 -7 -5 -16 4.6 4.6 -4 1.6 -1 5.6 11 13 7.6 6.6 9.6 5.6 207 179 41 55 29 269 21 21 19 2.6 1.9 31 113 159 58 44 92 31 1374 1374

  20. The ANOVA table: SSWithin Degrees of Freedom Sum of Squares Mean Square F-Statistic Source Between (or Factor) Within (or Error) Totals s2 s2 Between/s2 2 SSBetween Between = SSBetween/k-1 Within s2 15 SSWithin Within = SSWithin/n-k 17 SSTotal ? Rearranging the Sum of Squares model equation gives: = + SS SS SS Total Between Within ? Then: − − = = SS SS SS SS SS SS 224 Total Total Between Between Within Within = = 1374 1150 - -

  21. The ANOVA Table: Mean Square & Fcalc Degrees of Freedom Freedom Sum of Squares Squares Mean Square Square Degrees of Sum of Mean F-Statistic F-Statistic Source Between (or Factor) Within (or Error) Source Between (or Factor) Within (or Error) Totals Totals s2 s2 Between/s2 38.33 2 2 1150 1150 Between = SSBetween/k-1 575 Within s2 15 15 224 224 Within = SSWithin/n-k 15 17 17 1374 1374 = = 2 2 Between Between s = = 2 Within s Within s 2 SS df s SS 1150 df 2 SS 224 SS df 15 df Between Between Between Between Within Within Within Within = = = = 575 15 = = = = = = 2 Between 575 2 Within s 15 2 Between 2 Within s Calc F Calc F s s = = . 38 = = 33

  22. The ANOVA Table –Com plete Degrees of Freedom Sum of Squares Mean Square F-Statistic 2 1150 575 38.33 15 224 15 17 1374

  23. Com paring Variances ? What statistic is used to test variances? ? Hint: It’s a ratio… ? F-statistic 2 Between s s = F Calc 2 Within ? ANOVA is an F-test ? ANOVA tests whether the two estimates of the population variance are the same ? It is a one-sided test, i.e., testing whether the variance between is greater than the variance within ? Degrees of freedom, numerator (between) is nGroups– 1 ? Degrees of freedom, denominator (within) is nSamples– nGroups

  24. Finding FCritical ? In Minitab select Calc > Probability Distributions > F… dfBetween dfWithin 1-α α ? Minitab Output P( X <= x ) x 0.9500 3.68232 Accept or Reject Ho?

  25. Step 1: An ANOVA Exam ple (Minitab) Server_Speed.mtw ? Are 3 email servers performing the same? ? The performance of three email servers was tested by measuring the time (in ms) each took to send a 125 KB message. ? Transmission times for 6 emails were measured for each server. ? Use ANOVA and the following data to detect differences among the different servers. Use α = 0.05 Server A 66 67 74 73 75 64 Server B 85 85 76 82 79 86 Server C 91 93 88 87 90 86 ? Step 1: State the practical problem: ? Do the servers transmit emails at different speeds?

  26. Steps 2 & 3: Hoand Ha ? Step 2: What is the Null Hypothesis? H _ μ = Server μ = Server μ 0: _ _ Server A B C ? Interpretation – All servers send emails at the same speed. ? Step 3: What is the Alternate Hypothesis? Ha : At least one is μ different ? Interpretation – At least one of the servers is different from the others.

  27. Step 4: Do the Assum ptions Hold? ? Independent, Random Sample ? Emails chosen randomly from a larger group (but all were 125 KB) ? Data is normal, with equal variances ? Run a normality test in Minitab ? Run a Test for Equal Variances in Minitab ? Use file Server_Speed.mtw ? Results: ? Normality test p-values: 0.230 0.304 0.859 ? Test for Equal Variances p-value: 0.480 Are the assumptions valid?

  28. Step 5: Construct the ANOVA Table ? In Minitab select Stat > ANOVA One-way… Why not use Anova (Unstacked)?

  29. Minitab ANOVA Output Source DF SS MS F P Server 2 1149.8 574.9 38.41 0.000 Error 15 224.5 15.0 Total 17 1374.3 Do these values look familiar? (Check slide 30) Accept or Reject H0?

  30. Minitab ANOVA Output (continued) ? Minitab displays the group means and sigmas. ? Minitab also displays a graphical representation of the confidence intervals based on the pooled sigma. ? Overlapping intervals are probably not significantly different. ? Which intervals are different in this example? Individual 95% CIs For Mean Based on Pooled StDev -----+---------+---------+---------+---- Level N Mean StDev A 6 69.833 4.708 (----*----) B 6 82.167 3.971 (---*----) C 6 89.167 2.639 (---*----) -----+---------+---------+---------+---- 70.0 77.0 84.0 91.0

  31. Step 6: Do the Error Assum ptions Hold? ? What are the error assumptions? ? Errors of the model are independent. ? Without any trends, patterns, or obvious outliers ? Errors of the model are always normally distributed around zero. ? Normally distributed – Normality test ? Around zero – a mathematical certainty due to the nature of the model = + + μ τ ε y ij j ij ? How do we test the error assumptions? ? With a Minitab analysis of Residuals and Fits ? Fit – the calculated value if the model was true. For One-way ANOVA: Fit μ + = τ j ? Residual – the Raw data point (measurement)

  32. Residual Analysis ? In Minitab select Stat > ANOVA>One-Way>Graphs>Residual Plots>Four in One…

  33. Residual Analysis Output Random about zero with no trends? Do the Residuals look normal? Residual Plots for Time Normal Probability Plot R esiduals Versus the Fitted Values 99 5.0 90 2.5 Residual Percent 0.0 50 -2.5 10 -5.0 1 -10 -5 0 5 10 70 75 80 85 90 Residual Fitted Value Do the Residuals look normal? Histogram Residuals Versus the Order of the Data 4 5.0 2.5 3 Frequency Residual 0.0 2 -2.5 1 -5.0 0 -6 -4 -2 0 2 4 6 2 4 6 8 10 12 14 16 18 Residual Observation Order Look for trends in sample order or outliers Do the error assumptions hold?

  34. Norm ality Test for Residuals Probability Plot of RESI1 Normal 99 Mean StDev N AD P-Value 3.157968E-15 3.634 18 95 0.495 90 0.187 80 70 60 50 40 30 Percent 20 10 5 1 -10 -5 0 5 10 RESI1

  35. Step 7: Interpret the p-value One-way ANOVA: Time versus Server Source DF SS MS F P Server 2 1149.8 574.9 38.41 0.000 Error 15 224.5 15.0 Total 17 1374.3 S = 3.869 R-Sq = 83.66% R-Sq(adj) = 81.49% Accept or Reject H0? Reject Ho. p-value << 0.05 .

  36. Step 8-9: %SS Contribution& Conclusions ? Step 8: What is % SS contribution? ? A quantity that measures the contribution of each individual source of variation out of the total variation. 1150 1375 SS SS = = = = = . 0 = % % 84 84 or % % Between Between SS SS or Factor Factor SS SS Total Total 1375 225 SS SS = = = = = . 0 = % % 16 16 or % % Within Within SS SS or Error Error SS SS Total Total ? Step 9: Practical Interpretation ? 84% of the variation in the time to send the emails is explained by the different servers. ? Server A transmits email the fastest.

More Related