Chapter 5: Hypothesis Testing and Statistical Inference

1. � 2007 Pearson Education Chapter 5: Hypothesis Testing and Statistical Inference

2. Hypothesis Testing Hypothesis testing involves drawing inferences about two contrasting propositions (hypotheses) relating to the value of a population parameter, one of which is assumed to be true in the absence of contradictory data. We seek evidence to determine if the hypothesis can be rejected; if not, we can only assume it to be true but have not statistically proven it true.

3. Hypothesis Testing Procedure Formulate the hypothesis Select a level of significance, which defines the risk of drawing an incorrect conclusion that a true hypothesis is false Determine a decision rule Collect data and calculate a test statistic Apply the decision rule and draw a conclusion

4. Hypothesis Formulation Null hypothesis, H0 � a statement that is accepted as correct Alternative hypothesis, H1 � a proposition that must be true if H0 is false Formulating the correct set of hypotheses depends on �burden of proof� � what you wish to prove statistically should be H1 Tests involving a single population parameter are called one-sample tests; tests involving two populations are called two-sample tests.

5. Types of Hypothesis Tests One Sample Tests H0: population parameter ? constant vs. H1: population parameter < constant H0: population parameter ? constant vs. H1: population parameter > constant H0: population parameter = constant vs. H1: population parameter ? constant Two Sample Tests H0: population parameter (1) - population parameter (2) ? 0 vs. H1: population parameter (1) - population parameter (2) < 0 H0: population parameter (1) - population parameter (2) ? 0 vs. H1: population parameter (1) - population parameter (2) > 0 H0: population parameter (1) - population parameter (2) = 0 vs. H1: population parameter (1) - population parameter (2) ? 0

6. Four Outcomes The null hypothesis is actually true, and the test correctly fails to reject it. The null hypothesis is actually false, and the hypothesis test correctly reaches this conclusion. The null hypothesis is actually true, but the hypothesis test incorrectly rejects it (Type I error). The null hypothesis is actually false, but the hypothesis test incorrectly fails to reject it (Type II error).

7. Quantifying Outcomes Probability of Type I error (rejecting H0 when it is true) = a = level of significance Probability of correctly failing to reject H0 = 1 � a = confidence coefficient Probability of Type II error (failing to reject H0 when it is false) = b Probability of correctly rejecting H0 when it is false = 1 � b = power of the test

8. Decision Rules Compute a test statistic from sample data and compare it to the hypothesized sampling distribution of the test statistic Divide the sampling distribution into a rejection region and non-rejection region. If the test statistic falls in the rejection region, reject H0 (concluding that H1 is true); otherwise, fail to reject H0

9. Rejection Regions

10. Hypothesis Tests and Spreadsheet Support

11. Hypothesis Tests and Spreadsheet Support (cont�d)

12. One Sample Tests for Means � Standard Deviation Unknown Example hypothesis H0: m ? m0 versus H1: m < m0 Test statistic: Reject H0 if t < -tn-1, ?

13. Example For the Customer Support Survey.xls data, test the hypotheses H0: mean response time ? 30 minutes H1: mean response time < 30 minutes Sample mean = 21.91; sample standard deviation = 19.49; n = 44 observations Reject H0 because t = �2.75 < -t43,0.05 = -1.6811

14. PHStat Tool: t-Test for Mean PHStat menu > One Sample Tests > t-Test for the Mean, Sigma Unknown

15. Results

16. Using p-Values p-value = probability of obtaining a test statistic value equal to or more extreme than that obtained from the sample data when H0 is true

17. One Sample Tests for Proportions Example hypothesis H0: p ? p0 versus H1: p < p0 Test statistic: Reject if z < -za

18. Example For the Customer Support Survey.xls data, test the hypothesis that the proportion of overall quality responses in the top two boxes is at least 0.75 H0: p ? .75 H0: p < .75 Sample proportion = 0.682; n = 44 For a level of significance of 0.05, the critical value of z is -1.645; therefore, we cannot reject the null hypothesis

19. PHStat Tool: One Sample z-Test for Proportions PHStat > One Sample Tests > z-Tests for the Proportion

20. Results

21. Type II Errors and the Power of a Test The probability of a Type II error, b, and the power of the test (1 � b) cannot be chosen by the experimenter. The power of the test depends on the true value of the population mean, the level of confidence used, and the sample size. A power curve shows (1 � b) as a function of m1.

22. Example Power Curve

23. Two Sample Tests for Means � Standard Deviation Known Example hypothesis H0: m1 � m2 ? 0 versus H1: m1 - m2 < 0 Test Statistic: Reject if z < -za

24. Two Sample Tests for Means � Sigma Unknown and Equal Example hypothesis H0: m1 � m2 ? 0 versus H1: m1 - m2 > 0 Test Statistic: Reject if z > za

25. Two Sample Tests for Means � Sigma Unknown and Unequal Example hypothesis H0: m1 � m2 = 0 versus H1: m1 - m2 ? 0 Test Statistic: Reject if z > za/2 or z < - za/2

26. Excel Data Analysis Tool: Two Sample t-Tests Tools > Data Analysis > t-test: Two Sample Assuming Unequal Variances, or t-test: Two Sample Assuming Equal Variances Enter range of data, hypothesized mean difference, and level of significance Tool allows you to test H0: ?1 - ?2 = d Output is provided for upper-tail test only For lower-tail test, change the sign on t Critical one-tail, and subtract P(T<=t) one-tail from 1.0 for correct p-value

27. PHStat Tool: Two Sample t-Tests PHStat > Two Sample Tests > t-Test for Differences in Two Means Test assumes equal variances Must compute and enter the sample mean, sample standard deviation, and sample size

28. Comparison of Excel and PHStat Results � Lower-Tail Test

29. Two Sample Test for Means With Paired Samples Example hypothesis H0: average difference = 0 versus H1: average difference ? 0 Test Statistic: Reject if t > tn-1,a/2 or t < - tn-1,a/2

30. Two Sample Tests for Proportions Example hypothesis H0: p1 � p2 = 0 versus H1: p1 - p2 ? 0 Test Statistic: Reject if z > za/2 or z < - za/2

31. Hypothesis Tests and Confidence Intervals If a 100(1 � a)% confidence interval contains the hypothesized value, then we would not reject the null hypothesis based on this value with a level of significance a. Example hypothesis H0: m ? m0 versus H1: m < m0 If a 100(1-a)% confidence interval does not contain m0, then we can reject H0

32. F-Test for Differences in Two Variances Hypothesis H0: s12 � s2 2= 0 versus H1: s12 - s22 ? 0 Test Statistic: Assume s12 > s22 Reject if F > Fa/2,n1-1,n2-1 (see Appendix A.4) Assumes both samples drawn from normal distributions

33. Excel Data Analysis Tool: F-Test for Equality of Variances Tools > Data Analysis > F-test for Equality of Variances Specify data ranges Use a/2 for the significance level! If the variance of Variable 1 is greater than the variance of variable 2, the output will specify the upper tail; otherwise, you obtain the lower tail information.

34. PHStat Tool: F-Test for Differences in Variances PHStat menu > Two Sample Tests > F-test for Differences in Two Variances Compute and enter sample standard deviations Enter the significance level a, not a/2 as in Excel

35. Excel and PHStat Results

36. Analysis of Variance (ANOVA) Compare the means of m different groups (factors) to determine if all are equal H0: m1 = m1 = ... mm H1: at least one mean is different from the others

37. ANOVA Theory nj = number of observations in sample j SST = total variation in the data SSB = variation between groups SSW = variation within groups

38. ANOVA Test Statistic MSB = SSB/(m � 1) MSW = SSW/(n � m) Test statistic: F = MSB/MSW Has an F-distribution with m-1 and n-m degrees of freedom Reject H0 if F > Fa/2,m-1,n-m

39. Excel Data Analysis Tool for ANOVA Tools > Data Analysis > ANOVA: Single Factor

40. ANOVA Results

41. ANOVA Assumptions The m groups or factor levels being studied represent populations whose outcome measures are Randomly and independently obtained Are normally distributed Have equal variances Violation of these assumptions can affect the true level of significance and power of the test.

42. Nonparametric Tests Used when assumptions (usually normality) are violated. Examples: Wilcoxon rank sum test for testing difference between two medians Kurskal-Wallis rank test for determining whether multiple populations have equal medians. Both supported by PHStat

43. Tukey-Kramer Multiple Comparison Procedure ANOVA cannot identify which means may differ from the rest PHStat menu > Multiple Sample Tests > Tukey-Kramer Multiple Comparison Procedure

44. Chi-Square Test for Independence Test whether two categorical variables are independent H0: the two categorical variables are independent H1: the two categorical variables are dependent

45. Example Is gender independent of holding a CPA in an accounting firm?

46. Chi-Square Test for Independence Test statistic Reject H0 if c2 > c2a, (r-1)(c-1) PHStat tool available in Multiple Sample Tests menu

47. Example

48. PHStat Procedure Results

49. Design of Experiments A test or series of tests that enables the experimenter to compare two or more methods to determine which is better, or determine levels of controllable factors to optimize the yield of a process or minimize the variability of a response variable.

50. Factorial Experiments All combinations of levels of each factor are considered. With m factors at k levels, there are km experiments. Example: Suppose that temperature and reaction time are thought to be important factors in the percent yield of a chemical process. Currently, the process operates at a temperature of 100 degrees and a 60 minute reaction time. In an effort to reduce costs and improve yield, the plant manager wants to determine if changing the temperature and reaction time will have any significant effect on the percent yield, and if so, to identify the best levels of these factors to optimize the yield.

51. Designed Experiment Analyze the effect of two levels of each factor (for instance, temperature at 100 and 125 degrees, and time at 60 and 90 minutes) The different combinations of levels of each factor are commonly called treatments.

52. Treatment Combinations

53. Experimental Results

54. Main Effects Measures the difference in the response that results from different factor levels Calculations Temperature effect = (Average yield at high level) � (Average yield at low level) = (B + D)/2 � (A + C)/2 = (90.5 + 81)/2 � (84 + 88.5)/2 = 85.75 � 86.25 = �0.5 percent. Reaction effect = (Average yield at high level) � (Average yield at low level) = (C + D)/2 � (A + B)/2 = (88.5 + 81)/2 � (84 + 90.5)/2 = 84.75 � 87.25 = �2.5 percent.

55. Interactions When the effect of changing one factor depends on the level of other factors. When interactions are present, we cannot estimate response changes by simply adding main effects; the effect of one factor must be interpreted relative to levels of the other factor.

56. Interaction Calculations Take the average difference in response when the factors are both at the high or low levels and subtracting the average difference in response when the factors are at opposite levels. Temperature � Time Interaction = (Average yield, both factors at same level) � (Average yield, both factors at opposite levels) = (A + D)/2 � (B + C)/2 = (84 + 81)/2 � (90.5 + 88.5)/2 = -7.0 percent

57. Graphical Illustration of Interactions

58. Two-Way ANOVA Method for analyzing variation in a 2-factor experiment SST = SSA + SSB + SSAB + SSW where SST = total sum of squares SSA = sum of squares due to factor A SSB = sum of squares due to factor B SSAB = sum of squares due to interaction SSW = sum of squares due to random variation (error)

59. Mean Squares MSA = SSA/(r � 1) MSB = SSB/(c � 1) MSAB = SSAB/(r-1)(c-1) MSW = SSW/rc(k-1), where k = number of replications of each treatment combination.

60. Hypothesis Tests Compute F statistics by dividing each mean square by MSW. F = MSA/MSW tests the null hypothesis that means for each treatment level of factor A are the same against the alternative hypothesis that not all means are equal. F = MSB/MSW tests the null hypothesis that means for each treatment level of factor A are the same against the alternative hypothesis that not all means are equal. F = MSAB/MSW tests the null hypothesis that the interaction between factors A and B is zero against the alternative hypothesis that the interaction is not zero.

61. Excel Anova: Two-Factor with Replication

62. Results

Chapter 5: Hypothesis Testing and Statistical Inference

Chapter 5: Hypothesis Testing and Statistical Inference

Presentation Transcript

Statistical Inference II: Pitfalls of hypothesis testing; confidence intervals/effect sizes

HYPOTHESIS TESTING

CHAPTER 2

Chapter 8

Hypothesis Testing

CHAPTER 6 Statistical Inference & Hypothesis Testing

Chapter VIII: Elements of Inferential Statistics

Hypothesis Testing

CHAPTER 6 Statistical Inference & Hypothesis Testing

Hypothesis Testing

Hypothesis testing

Statistical Inference

STATISTICAL INFERENCE PART VI

Statistical Inference

Lecture 5 The Problem of Statistical Inference (Chapters 5 and 8)

STATISTICAL INFERENCE PART I POINT ESTIMATION

Chapter 14

Introduction to Hypothesis Testing

Statistical Inference I: Hypothesis testing; sample size

Chapter 9 Hypothesis Testing

Hypothesis Testing

Statistical Inference

Chapter 5: Hypothesis Testing and Statistical Inference