Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) • A single-factor ANOVA can be used to compare more than two means. For example, suppose a manufacturer of paper used for grocery bags is concerned about the tensile strength of the paper. Product engineers believe that tensile strength is a function of the hardwood concentration and want to test several concentrations for the effect on tensile strength. If there are 2 different hardwood concentrations (say, 5% and 15%), then a z-test or t-test is appropriate: H0: μ1 = μ2 H1: μ1 ≠ μ2

Comparing More Than Two Means • What if there are 3 different hardwood concentrations (say, 5%, 10%, and 15%)? H0: μ1 = μ2H0: μ1 = μ3H0: μ2 = μ3 H1: μ1 ≠ μ2 H1: μ1 ≠ μ3H1: μ2 ≠ μ3 • How about 4 different concentrations (say, 5%, 10%, 15%, and 20%)? All of the above, PLUS H0: μ1 = μ4H0: μ2 = μ4H0: μ3 = μ4 H1: μ1 ≠ μ4 H1: μ2 ≠ μ4H1: μ3 ≠ μ4 • What about 5 concentrations? 10? and and and and

Comparing Multiple Means - Type I Error • Suppose α = 0.05 P(Type 1 error) = 0.05 (1 – α) = P (accept H0 | H0 is true) = 0.95 • Conducting multiple t-tests increases the probability of a Type 1 error • The greater the number of t-tests, the greater the error probability • 4 concentrations: (0.95)4 = 0.814 • 5 concentrations: (0.95)5 = 0.774 • 10 concentrations: (0.95)10 = 0.599 • Making the comparisons simultaneously (as in an ANOVA) reduces the error back to 0.05

Analysis of Variance (ANOVA) Terms • Independent variable: that which is varied • Treatment • Factor • Level: the selected categories of the factor • In a single–factor experiment there are alevels • Dependent variable: the measured result • Observations • Replicates • (N observations in the total experiment) • Randomization: performing experimental runs in random order so that other factors don’t influence results.

The Experimental Design • Suppose a manufacturer is concerned about the tensile strength of the paper used to produce grocery bags. Product engineers believe that tensile strength is a function of the hardwood concentration and want to test several concentrations for the effect on tensile strength. Six specimens were made at each of the 4 hardwood concentrations (5%, 10%, 15%, and 20%). The 24 specimens were tested in random order on a tensile test machine. • Terms • Factor: Hardwood Concentration • Levels: 5%, 10%, 15%, 20% • a = 4 • N = 24

The Results and Partial Analysis • The experimental results consist of 6 observations at each of 4 levels for a total of N = 24 items. To begin the analysis, we calculate the average and total for each level.

To determine if there is a difference in the response at the 4 levels … • Calculate sums of squares • Calculate degrees of freedom • Calculate mean squares • Calculate the F statistic • Organize the results in the ANOVA table • Conduct the hypothesis test

Calculate the sums of squares

Additional Calculations Calculate Degrees of Freedom dftreat = a – 1 = 3 df error = a(n – 1) = 20 dftotal = an – 1 = 23 Mean Square, MS = SS/df MStreat = 382.7917/3 = 127.5972 MSE = 130.1667 /20 = 6.508333 Calculate F = MStreat / MSError = 127.58 / 6.51 = 19.61

Organizing the Results Build the ANOVA table Determine significance • fixed α-level  compare to Fα,a-1, a(n-1) • p – value  find p associated with this F with degrees of freedom a-1, a(n-1)

Conduct the Hypothesis Test Null Hypothesis: The mean tensile strength is the same for each hardwood concentration. Alternate Hypothesis: The mean tensile strength differs for at least one hardwood concentration Compare Fcrit to Fcalc Draw the graphic State your decision with respect to the null hypothesis State your conclusion based on the problem statement

Hypothesis Test Results Null Hypothesis: The mean tensile strength is the same for each hardwood concentration. Alternate Hypothesis: The mean tensile strength differs for at least one hardwood concentration Fcrit less than Fcalc Draw the graphic Reject the null hypothesis Conclusion: The mean tensile strength differs for at least one hardwood concentration.

Post-hoc Analysis: “Hand Calculations” • Calculate and check residuals, eij = Oi - Ei • plot residuals vs treatments • normal probability plot • Perform ANOVA and determine if there is a difference in the means • If the decision is to reject the null hypothesis, identify which means are different using Tukey’s procedure: • Model: yij = μ + αi + εij

Graphical Methods - Computer Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev +---------+---------+---------+--------- 5% 6 10.000 2.828 (----*----) 10% 6 15.667 2.805 (----*-----) 15% 6 17.000 1.789 (----*-----) 20% 6 21.167 2.639 (-----*----) +---------+---------+---------+--------- 8.0 12.0 16.0 20.0

Numerical Methods - Computer • Tukey’s test • Duncan’s Multiple Range test • Easily performed in Minitab • Tukey 95% Simultaneous Confidence Intervals (partial results) 10% subtracted from: Lower Center Upper ----+---------+---------+---------+----- 15% -2.791 1.333 5.458 (-----*-----) 20% 1.376 5.500 9.624 (-----*-----) ----+---------+---------+---------+----- -7.0 0.0 7.0 14.0

Analysis of Variance (ANOVA)