240 likes | 510 Views
Topic-11. Analysis of Variance. Analysis of Variance. What is “Analysis of Variance” (ANOVA)?
E N D
Topic-11 Analysis of Variance
Analysis of Variance What is “Analysis of Variance” (ANOVA)? * Variance: is a very important numerical feature (second to the “mean”) of a data set - which measures how the data are spread around the mean and the relationships among the individual items within a data set. * As such, in practice, understanding and testing the variances of different data sets sometime are essential in a statistical study: -- Are the variances of two populations (from which two samples have been taken) statistically equal or not? Or -- When two or more population means will be compared simultaneously, are they having equal variances? * Test for the above questions is called: Analysis of Variance with F-distribution as the test statistics. * F-Distribution has the following major characteristics: -- A family of curves with different “degree of freedom” from two samples (numerator/denominator). -- A continuous distribution, non-negative and positively skewed. -- F-values range from 0 to º, when F § º, the curve § X-axis.
Testing Two Population Variances Example-1: Two packaging machines work on a same job order with each packs 1,000 units of 1 lb bag. Two samples are taken from each machine and found two samples have a statistically identical “mean” but different sample variances (s²1, s²2). - Question: Can we say that two machines are working on a same quality level? (i.e., are population variances same?) Example-2: Two university students participated a national statistics test. Sample Test scores of 100 students are taken from each university and found two samples have a statistically identical “mean” (e.g., 80.4 vs. 80. 6) but different sample variances (s²1, s²2). - Question: Can we say that the academic performance of two university students in this test are basically same? (i.e., are population variances same?) Or - Question: Can we assume that two population variances are same so that we can test whether two population means are statistically same or not? (Validating an assumption).
Testing Hypothesis with F-Distributions Application Case-1: Two-Tailed (or One-tailed) Test for two population variances (r1², r2 ²) with small samples (n1, n2< 25) and known sample variances (s²1, s²2) and same sizes (n1, n2): (5-Step Procedure) * Set: H0: r1² = r2 ² and H1: r1² g r2 ² (for two-tailed test), {Or} [ Set: H0: r1² < r2 ² and H1: r1² > r2 ² (for one-tailed test)] * Test Statistic (F): F = Larger{s²1, s²2} / Smaller {s²1, s²2} * Decision Rule: If computed F > Critical Value, Reject H0, Otherwise, Don’t reject H0 (i.e. Accept H1). Critical Value: From F-Table by: (1) Confidence level (a/2) and (2) Two “degree of freedom” [ df = (n-1) and df = (n-2) ]. (Examples) Note: From the definition above, computed F-value has a range of {1 -- º}. It is obvious that when F = 1 (s²1 = s²2), the H0 should be accepted, and only when F-value increases to a value (beyond the critical value under selected confidence level) then H0 can be statistically rejected.
ANOVA: Key Assumptions ANOVA (Analysis of Variance): is designed to use F-value to test whether the means of two (or more) samples (from different or same populations) are equal under the following assumptions: * The population(s) is/are normally distributed with equalvariances (i.e., equal standard deviations). * Samples are randomly selected and independent to each other. Note: ANOVA is originally designed to test the effect of a specific treatment (a cause for a specific change such as: a new training program for employees in a plant, a new medicine for a selected group of patients). That is, normally, among sample groups selected for testing, one (or more) group is given this specific treatment while other groups are not. Then if a later ANOVA test indicates that there is a significant statistical differences between group (with treatment vs. without treatment) means (e.g., mean testing scores after training or average health improvement after taking a specific medicine), it will be viewed as a statistical “proof” for the effectiveness of the “treatment” being tested.
ANOVA: Testing Procedures (Similar 5-Step Procedures) Step-1: Set Null and Alternative Hypotheses: (Assuming n means) * H0: µ1 = µ2 = µ3 .... = µn and H1: At least one mean is different. Step-2: Select the Level of Significance (a = 0.01, 0.05, or 0.10). Step-3: Compute Test Statistic F: Under following assumptions: * Data are interval-level or above and samples are random. * Population(s) are Normal with equal variances. * Test Statistic (F): is the ratio of two variances: F = (between sample variance)/(within sample variance) = [SST/(k-1)]/[SSE/(N-k)] = MSTR/MSE, (where) (k-1): df of the numerator, k - the number of treatments (i.e., sample groups), (N-k): df of the denominator, N - total number of observations SST - represents the “Sum of Square Treatments” SSE - represents the “Sum of Square Errors” MSTR - represents the “Mean Square between Treatments”, MSE - represents the “Mean Square Errors”.
Step-4: Decision Rule - Based on given significance level (a) and two “degree of freedom” [(k-1) and (N-k)], the critical F-value is determined from F-Distribution Tables. Step-5: Construct an ANOVA Table, Arrive At A Decision: Let Tc - for the Column Total, nc - the Observation Number in Each Column, and SX - the Sum of all Observations. Then, calculate computed F-value with following formulas: SST=Σ[ - ] SSE = Σ(X)2 - Σ SS(Total) = Σ(x2) – Then, if computed F > Critical F-value, Reject H0, Otherwise, don’t Reject H0 (i.e., accepting H1). (Examples)
Inferences about Treatment Means: A Follow-Up Analysis in ANOVA * When H0 (all means are equal) is rejected, the H1 (at least one mean from a treatment will differ) is accepted. Then, a natural follow-up question (if interested) is - which one? and How much? * Several procedures can be used to identify such a difference - like the use of “confidence interval” (C.I.). The t-distribution is used in calculating such a C.I., as given below: Where t is obtained from the t table with degrees of freedom (N – k). MSE = [SSE/(N-k)]. [X̄1 - X̄2] ± t√MSE[ + ] * Reach a Decision: If the C.I. calculated above contains “zero” (i.e. low limit < 0 and upper limit > 0), then it can be said that there is no significant statistical difference between “treatment” means. Otherwise, if the C.I. above does not include “zero” (i.e., both lower limit and upper limit are on the same side from the zero), then, it suggests that there is a significant statistical difference between two compared “treatment” means.
* From computer printout of statistics software (i.e., MiniTab), the conclusion could be reached by observing: - If the C.I.s of two (or more) means are “overlapping’, there is no significant difference between the two means. - If the C.I.s of two (or more) means contain no “common” area (i.e., no “overlapping’), then those two (or more) means are statistically significant different. * Such a follow-up analysis is a careful step-by-step process, only needed when the H0 (all means are equal) has been rejected and the difference(s) are of interest in the study. (Examples)
Two-Factor ANOVA: Investigating Two Factors • * ANOVA is a procedure to study the variations and its causes within a data set (from an experiment). When only the treatment effects (as “one” factor) are studied, we consider the “variations” resulting from the treatment (e.g., different test scores as a result of a new training program) as the mail “effect”, and all the variations which can be contributed to other factors (e.g., test timing) are considered as “random errors” • - One-Factor ANOVA. • If the variations caused by another factor (in addition to the “main” effect from treatment) are also under consideration (called as “blocking variable”), then two-factor ANOVA should be used to examine both the “treatment” and “blocking” effects. Such a two-factor ANOVA can be performed in most modern statistical software packages. The required formula is as below: • Let SSB represent the sum of squares for the blocks where SSB = Σ[ ] –
Summary * In most research projects, the differences among “blocking” variables are not of interest and there is no need for a “hypothesis” test. Only if there is a significant interest in such a difference, then two set of null hypothesis and alternative hypothesis can be tested in a same experiment (two-factor experiment). * In most cases, the calculations required for an ANOVA (ormany other statistical testings) procedure can be conducted by computer statistics software packages (such as: MiniTab, SPSS, and etc.). The skill to use those computer statistics programs has become more and more important in all modern times business and managerial positions.
Example-1 • A stockbroker at Critical Securities reported that the mean rate of return on a sample of 10 oil stocks was 12.6 percent with standard deviation of 3.9 percent. The mean rate of return on a sample of 8 utility stocks was 10.9 percent with a standard deviation of 3.5 percent. At the 0.05 significance level, can we conclude that there is more variation in the oil stocks? • Step1: state the null and the alternative hypotheses. • H0: σ0≤ σu H1: σ0> σu • Step 2: State the decision rule. • H0 is rejected if F>3.68, df= (9,7), α=0.05. • Step 3: Compute the value of the test statistic. • F= (3.9)2/(3.5)2 = 1.2416. • Step 4: What is the decision on H0? • H0 is not rejected. Insufficient evidence to claim more variation in the oil stock.
Example- 2 • Whitte Restaurants specialize in meals for senior citizens and families. Wendy Whitte, President, recently developed a new meat loaf dinner. Before making it a part of the regular menu she decides to test it in several of her restaurants. She would like to know if there is a difference in the mean number of dinners sold per day at the Maumee, Rossford, and Point Place Restaurants for a sample of days. At the 0.05 significance level can she conclude that there is a difference in the mean number of meat loaf dinners sold per day at the three restaurants?
From the previous table: • SST = 76.25; SSE = 9.75 • Test statistic: F= [ 76.25/2] / [9.75/10] = 29.1026 • Step 1: State the null and the alternative hypotheses. • H0 : µ1 = µ2 = µ3 H1: Not all the means are the same. • Step 2: State the decision rule. • H0 is rejected if F> 4.10 • Step 3: Compute the value of the test statistic. • F= 39.10 (Verify) • Step 4: What is the decision on H0? • H0 is rejected. There is a difference in the mean number of dinners sold. • Develop a 95% confidence interval for the difference in the mean number of meat loaf dinners sold in Maumee and Point Place. Can Ms. Whitte conclude that there is a difference between the two restaurants? (17-12.75) ± 2.228 √0.975(1/4+1/5) 4.25 ± 1.48 → (2.77, 5.73) • These restaurants differ since both limits are positive. Observe that the confidence intervals do not overlap in the MINITAB output.
Example-3 • The Brunner Manufacturing Company operates 24 hours a day, five days a week. The workers rotate shifts each week. Management is interested in whether there is a difference in the number of units produced when the employees work on various shifts. A sample if five workers is selected and their output recorded on each shift. At the 0.05 significance level, can we conclude there is a difference in the mean production by employee? The table is shown in the next slide.
Example-3 (Contd.) • H0State the null and the alternative hypotheses for the treatment effect. • :µ1 = µ2=µ3 H1: Not all means are equal. • State the decision rule. • H0 is rejected if F>4.46, df=(2,8) • Compute the various sum of squares • SS (total) = 139.73, SST = 62.53, SSB = 33.73, SSE = 43.47. df (block) = 4, df(treatment)=2, df(error)= 8. • Compute the test statistic value • F= [62.53/2]/[43.47/8] = 5.75 • What is the decision on H0? • H0 is rejected. There is difference in the average number of units produced for the different time period. • State the null and the alternative hypotheses for the block effect. • H0:µ1 = µ2 = µ3 = µ4 = µ5. • H1: Not all means are equal. • State the decision rule. • H0 is rejected if F>3.84, df=(4,8). • Compute the test statistic value. • F = [33.73/4]/[43.47/8] = 1.55 • What is the decision on H0? • H0 is not rejected. There is no significant difference in the average number if units produced for the different employees.
Exercises: Type II (ANOVA) • Steele Electric Products, Inc. assembles electrical components for stereo equipment. For the last 10 days Mark Nagy has averaged 9 rejects, with a standard deviation of 2 rejects per day. Debbie Richmond averaged 8.5 rejects, with a standard deviation of 1.5 rejects, over the same period. At the 0.05 significance level, can we conclude that there is more variation in the number of rejects per day attributed to Mark? • B. The following data are the tuition charges ($1,000) for a sample of private colleges in various regions of the United States. At the 0.05 significance level, can we conclude there is a difference in the mean tuition rates for the various regions? • State the null and the alternate hypotheses. • What is the decision rule? • Develop an ANOVA table. What is the value of the test statistic? • What is your decision regarding the null hypotheses? • Could there be a significant difference between the mean tuition in the Northeast and that of the West? If so, develop a 95 percent confidence interval for that difference.
Instruction-7: How to run <Minitab> for ANOVA Testing • You have saved dataset-1 on your disk <named ‘Data1.mtw>. Place this disk into your computer. • 2. Start <Minitab> Program: Read your data file into the program: • Go to <file>, then select <open worksheet> and click. On next screen, select the Drive X (where your data file is stored), and click [open]. Then click [OK] on next screen. • (Now, Data1 has been shown in the [worksheet] window.) • 3. For: Question-8, Case-1, Project Handout: • For Question (a): • Go to <Stat> ---- <ANOVA> ---<One-Way> • On next screen: • Type <C1> in the [Response] box, <C4> in the [Factor] box -- click [OK]. • You will see the printout as below: • --------------------------------------------------------- • Repeat the above for Questions (b), except using <C7> in the [Factor] box. • Repeat the above for Questions (c), except using <C6> in the [Factor] box.
==================================== a) One-way Analysis of Variance Analysis of Variance for Price Source DF SS MS F P Pool 1 19955 19955 9.75 0.002 Error 103 210812 2047 Total 104 230768 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev -----+---------+---------+---------+- 0 38 202.80 33.71 (---------*---------) 1 67 231.49 50.57 (------*-------) -----+---------+---------+---------+- Pooled StDev = 45.24 195 210 225 240 =====================================
=========================================== b) One-way Analysis of Variance Analysis of Variance for Price Source DF SS MS F P Garage 1 63914 63914 39.45 0.000 Error 103 166853 1620 Total 104 230768 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev --+---------+---------+---------+---- 0 34 185.45 28.00 (----*-----) 1 71 238.18 44.88 (---*---) --+---------+---------+---------+---- Pooled StDev = 40.25 175 200 225 250
==================================== c) One-way Analysis of Variance Analysis of Variance for Price Source DF SS MS F P Twnship 4 13263 3316 1.52 0.201 Error 100 217505 2175 Total 104 230768 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev -+---------+---------+---------+----- 1 15 196.91 35.78 (---------*--------) 2 20 227.45 44.19 (-------*-------) 3 25 228.79 48.65 (-------*------) 4 29 216.93 49.98 (------*------) 5 16 231.40 48.80 (---------*--------) -+---------+---------+---------+----- Pooled StDev = 46.64 175 200 225 250 ==================================