440 likes | 590 Views
統計學 Spring 2004. 授課教師:統計系余清祥 日期:2004年3月30日 第八週:變異數分析與實驗設計. Chapter 13 Analysis of Variance and Experimental Design. An Introduction to Analysis of Variance Analysis of Variance: Testing for the Equality of k Population Means
E N D
統計學 Spring 2004 授課教師:統計系余清祥 日期:2004年3月30日 第八週:變異數分析與實驗設計
Chapter 13 Analysis of Variance and Experimental Design • An Introduction to Analysis of Variance • Analysis of Variance: Testing for the Equality of k Population Means • Multiple Comparison Procedures • An Introduction to Experimental Design • Completely Randomized Designs • Randomized Block Design
An Introduction to Analysis of Variance • Analysis of Variance(ANOVA) can be used to test for the equality of three or more population means using data obtained from observational or experimental studies. • We want to use the sample results to test the following hypotheses. H0: 1=2=3=. . . = k Ha: Not all population means are equal • If H0 is rejected, we cannot conclude that all population means are different. • Rejecting H0 means that at least two population means have different values.
Assumptions for Analysis of Variance • For each population, the response variable is normally distributed. • The variance of the response variable, denoted 2, is the same for all of the populations. • The observations must be independent.
Analysis of Variance:Testing for the Equality of K Population Means • Between-Samples Estimate of Population Variance • Within-Samples Estimate of Population Variance • Comparing the Variance Estimates: The F Test • The ANOVA Table
Between-Samples Estimateof Population Variance • A between-samples estimate of 2 is called the mean square between(MSB). • The numerator of MSB is called the sum of squares between(SSB). • The denominator of MSB represents the degrees of freedom associated with SSB. _ =
Within-Samples Estimateof Population Variance • The estimate of 2 based on the variation of the sample observations within each sample is called the mean square within(MSW). • The numerator of MSW is called the sum of squares within(SSW). • The denominator of MSW represents the degrees of freedom associated with SSW.
Comparing the Variance Estimates: The F Test • If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of MSB/MSW is an F distribution with MSB d.f. equal to k - 1 and MSW d.f. equal to nT - k. • If the means of the k populations are not equal, the value of MSB/MSW will be inflated because MSB overestimates 2. • Hence, we will reject H0 if the resulting value of MSB/MSW appears to be too large to have been selected at random from the appropriate F distribution.
Test for the Equality of k Population Means • Hypotheses H0: 1=2=3=. . . = k Ha: Not all population means are equal • Test Statistic F = MSB/MSW • Rejection Rule Reject H0 if F > F where the value of F is based on an F distribution with k - 1 numerator degrees of freedom and nT - 1 denominator degrees of freedom.
Sampling Distribution of MSTR/MSE • The figure below shows the rejection region associated with a level of significance equal to where F denotes the critical value. Do Not Reject H0 Reject H0 MSTR/MSE F Critical Value
The ANOVA Table Source of Sum of Degrees of Mean Variation Squares Freedom Squares F TreatmentSSTR k - 1 MSTR MSTR/MSE Error SSE nT - k MSE TotalSST nT - 1 SST divided by its degrees of freedom nT - 1 is simply the overall sample variance that would be obtained if we treated the entire nT observations as one data set.
Example: Reed Manufacturing • Analysis of Variance J. R. Reed would like to know if the mean number of hours worked per week is the same for the department managers at her three manufacturing plants (Buffalo, Pittsburgh, and Detroit). A simple random sample of 5 managers from each of the three plants was taken and the number of hours worked by each manager for the previous week is shown on the next slide.
Example: Reed Manufacturing • Analysis of Variance Plant 1 Plant 2 Plant 3 Observation Buffalo Pittsburgh Detroit 1 48 73 51 2 54 63 63 3 57 66 61 4 54 64 54 5 62 74 56 Sample Mean 55 68 57 Sample Variance26.0 26.5 24.5
Example: Reed Manufacturing • Analysis of Variance • Hypotheses H0: 1= 2= 3 Ha: Not all the means are equal where: 1 = mean number of hours worked per week by the managers at Plant 1 2 = mean number of hours worked per week by the managers at Plant 2 3 = mean number of hours worked per week by the managers at Plant 3
Example: Reed Manufacturing • Analysis of Variance • Mean Square Between Since the sample sizes are all equal x = (55 + 68 + 57)/3 = 60 SSB = 5(55 - 60)2 + 5(68 - 60)2 + 5(57 - 60)2 = 490 MSB = 490/(3 - 1) = 245 • Mean Square Within SSW = 4(26.0) + 4(26.5) + 4(24.5) = 308 MSW = 308/(15 - 3) = 25.667 =
Example: Reed Manufacturing • Analysis of Variance • F - Test If H0 is true, the ratio MSB/MSW should be near 1 since both MSB and MSW are estimating 2. If Ha is true, the ratio should be significantly larger than 1 since MSB tends to overestimate 2. • Rejection Rule Assuming = .05, F.05 = 3.89 (2 d.f. numerator, 12 d.f. denominator). Reject H0 if F > 3.89
Example: Reed Manufacturing • Analysis of Variance • Test Statistic F = MSB/MSW = 245/25.667 = 9.55 • Conclusion F = 9.55 > F.05 = 3.89, so we reject H0. The mean number of hours worked per week by department managers is not the same at each plant.
Example: Reed Manufacturing • Analysis of Variance • ANOVA Table Source of Sum of Degrees of Mean Variation Squares Freedom Square F Treatments 490 2 245 9.55 Error308 12 25.667 Total798 14
Multiple Comparison Procedures Suppose that analysis of variance has provided statistical evidence to reject the null hypothesis of equal population means. Fisher’s least significance difference (LSD) procedure can be used to determine where the differences occur.
Fisher’s LSD Procedure • Hypotheses H0: i = j Ha: ij • Test Statistic • Rejection Rule Reject H0 if t < -ta/2 or t > ta/2 where the value of ta/2 is based on a t distribution with nT - k degrees of freedom.
Fisher’s LSD ProcedureBased on the Test Statistic xi - xj _ _ • Hypotheses H0: i = j Ha: ij • Test Statistic xi - xj • Rejection Rule Reject H0 if |xi - xj| > LSD where _ _ _ _
Example: Reed Manufacturing • Fisher’s LSD Assuming = .05, • Hypotheses (A)H0: 1 = 2 Ha: 12 • Test Statistic |x1 - x2| = |55 - 68| = 13 • Conclusion The mean number of hours worked at Plant 1 is not equal to the mean number worked at Plant 2. _ _
Example: Reed Manufacturing • Fisher’s LSD • Hypotheses (B) H0: 1 = 3 Ha: 13 • Test Statistic |x1 - x3| = |55 - 57| = 2 • Conclusion There is no significant difference between the mean number of hours worked at Plant 1 and the mean number of hours worked at Plant 3. _ _
Example: Reed Manufacturing • Fisher’s LSD • Hypotheses (C) H0: 2 = 3 Ha: 23 • Test Statistic |x2 - x3| = |68 - 57| = 11 • Conclusion The mean number of hours worked at Plant 2 is not equal to the mean number worked at Plant 3. _ _
An Introduction to Experimental Design • Statistical studies can be classified as being either experimental or observational. • In an experimental study, one or more factors are controlled so that data can be obtained about how the factors influence the variables of interest. • In an observational study, no attempt is made to control the factors. • Cause-and-effect relationships are easier to establish in experimental studies than in observational studies.
An Introduction to Experimental Design • A factor is a variable that the experimenter has selected for investigation. • A treatment is a level of a factor. • Experimental unitsare the objects of interest in the experiment. • A completely randomized designis an experimental design in which the treatments are randomly assigned to the experimental units. • If the experimental units are heterogeneous, blocking can be used to form homogeneous groups, resulting in a randomized block design.
Completely Randomized Designs • Between-Treatments Estimate of Population Variance • Within-Treatments Estimate of Population Variance • Comparing the Variance Estimates: The F Test • The ANOVA Table • Pairwise Comparisons
Between-Treatments Estimateof Population Variance • In the context of experimental design, the between-samples estimate of 2 is referred to as the mean square due to treatments(MSTR). • It is the same as what we previously called mean square between (MSB). • The formula for MSTR is • The numerator is called the sum of squares due to treatments(SSTR). • The denominator k - 1 represents the degrees of freedom associated with SSTR. _ =
Within-Treatments Estimateof Population Variance • The second estimate of 2, the within-samples estimate, is referred to as the mean square due to error(MSE). • It is the same as what we previously called mean square within (MSW). • The formula for MSE is • The numerator is called the sum of squares due to error(SSE). • The denominator nT - k represents the degrees of freedom associated with SSE.
ANOVA Table for aCompletely Randomized Design Source of Sum of Degrees of Mean Variation Squares Freedom Squares F TreatmentsSSTR k - 1 ErrorSSE nT - k TotalSST nT - 1
Example: Home Products, Inc. Home Products, Inc. is considering marketing a long- lasting car wax. Three different waxes (Type 1, Type 2, and Type 3) have been developed. In order to test the durability of these waxes, 5 new cars were waxed with Type 1, 5 with Type 2, and 5 with Type 3. Each car was then repeatedly run through an automatic carwash until the wax coating showed signs of deterioration. The number of times each car went through the carwash is shown on the next slide. Home Products, Inc. must decide which wax to market. Are the three waxes equally effective?
Example: Home Products, Inc. Wax Wax Wax Observation Type 1 Type 2 Type 3 1 48 73 51 2 54 63 63 3 57 66 61 4 54 64 54 5 62 74 56 Sample Mean 55 68 57 Sample Variance26.0 26.5 24.5
Example: Home Products, Inc. • Completely Randomized Design • Hypotheses H0: 1=2=3 Ha: Not all the means are equal where: 1 = mean number of washes for Type 1 wax 2 = mean number of washes for Type 2 wax 3 = mean number of washes for Type 3 wax
Example: Home Products, Inc. • Completely Randomized Design • Mean Square Between Treatments Since the sample sizes are all equal x = (x1 + x2 + x3)/3 = (55 + 68 + 57)/3 = 60 SSTR = 5(55 - 60)2 + 5(68 - 60)2 + 5(57 - 60)2 = 490 MSTR = 490/(3 - 1) = 245 • Mean Square Error SSE = 4(26.0) + 4(26.5) + 4(24.5) = 308 MSE = 308/(15 - 3) = 25.667 _ _ _ =
Example: Home Products, Inc. • Completely Randomized Design • Rejection Rule Assuming = .05, F.05 = 3.89 (2 d.f. numerator and 12 d.f. denominator). Reject H0 if F > 3.89. • Test Statistic F = MSTR/MSE = 245/25.667 = 9.55 • Conclusion Since F = 9.55 > F.05 = 3.89, we reject H0. The mean number of carwashes are not the same for all three waxes.
Example: Home Products, Inc. • Completely Randomized Design • ANOVA Table Source of Sum of Degrees of Mean Variation Squares Freedom Squares F Treatments490 2 245 9.55 Error 308 12 25.667 Total798 14
Randomized Block Design • The ANOVA Procedure • Computations and Conclusions
The ANOVA Procedure • The ANOVA procedure for the randomized block design requires us to partition the sum of squares total (SST) into three groups: sum of squares due to treatments, sum of squares due to blocks, and sum of squares due to error. • The formula for this partitioning is SST = SSTR + SSBL + SSE • The total degrees of freedom, nT - 1, are partitioned such that k - 1 degrees of freedom go to treatments, b - 1 go to blocks, and (k - 1)(b - 1) go to the error term.
ANOVA Table for aRandomized Block Design Source of Sum of Degrees of Mean Variation Squares Freedom Squares F TreatmentsSSTR k - 1 BlocksSSBL b - 1 ErrorSSE (k - 1)(b - 1) TotalSST nT - 1
Example: Eastern Oil Co. Eastern Oil has developed three new blends of gasoline and must decide which blend or blends to produce and distribute. A study of the miles per gallon ratings of the three blends is being conducted to determine if the mean ratings are the same for the three blends. Five automobiles have been tested using each of the three gasoline blends and the miles per gallon ratings are shown on the next slide.
Example: Eastern Oil Co. Automobile Type of Gasoline (Treatment) Blocks (Block)Blend XBlend YBlend Z Means 1 31 30 30 30.333 2 30 29 29 29.333 3 29 29 28 28.667 4 33 31 29 31.000 5 26 25 26 25.667 Treatment Means 29.8 28.8 28.4
Example: Eastern Oil Co. • Randomized Block Design • Mean Square Due to Treatments The overall sample mean is 29. Thus, SSTR = 5[(29.8 - 29)2 + (28.8 - 29)2 + (28.4 - 29)2] = 5.2 MSTR = 5.2/(3 - 1) = 2.6 • Mean Square Due to Blocks SSBL = 3[(30.333 - 29)2 + . . . + (25.667 - 29)2] = 51.33 MSBL = 51.33/(5 - 1) = 12.8 • Mean Square Due to Error SSE = 62 - 5.2 - 51.33 = 5.47 MSE = 5.47/[(3 - 1)(5 - 1)] = .68
Example: Eastern Oil Co. • Randomized Block Design • Rejection Rule Assuming = .05, F.05 = 4.46 (2 d.f. numerator and 8 d.f. denominator). Reject H0 if F > 4.46. • Test Statistic F = MSTR/MSE = 2.6/.68 = 3.82 • Conclusion Since 3.82 < 4.46, we cannot reject H0. There is not sufficient evidence to conclude that the miles per gallon ratings differ for the three gasoline blends.