430 likes | 604 Views
Hypothesis Testing. Math 1680. Overview. Introduction One-Sample z Tests and t Tests Two-Sample z Tests Chi-Squared Tests Summary. Introduction. Very often, we can model a chance process and use that model to predict results
E N D
Hypothesis Testing Math 1680
Overview • Introduction • One-Sample z Tests and t Tests • Two-Sample z Tests • Chi-Squared Tests • Summary
Introduction • Very often, we can model a chance process and use that model to predict results • Sometimes we get a result that seems far off of the prediction • An important question is how likely the observed result would be if the chance model was correct • Hypothesis tests offer an answer
Introduction • In a hypothesis test, an observed result is compared with the expected result from an appropriate chance model • We assume that the chance model is correct • Null hypothesis, or H0 • Usually want to reject the null hypothesis in favor of some alternative explanation
One-Sample z Tests and t Tests • For example, consider a coin with heads and tails on it • You flip the coin 40 times and find you get 25 heads • Assuming the coin is fair (note that this is the null hypothesis), how many heads would you expect to get? • How far off should you expect to be? • Is this difference significant? 20 3.16
One-Sample z Tests and t Tests • Recall that with enough flips, the number of heads is approximately normal (via the Central Limit Theorem) • The center of the curve is the expected number of heads • Approximately what is the probability of getting 25 or more heads in 40 flips? • Keep in mind that we need to use 24.5 instead of 25 when we standardize! • This number is the P-value 7.74%
One-Sample z Tests and t Tests • Most scientists will say that a P-value of less than 5% is statistically significant • This is usually good enough evidence to reject the null hypothesis • A P-value of less than 1% is highly significant • The null hypothesis should almost certainly be rejected • Bear in mind that these numbers are arbitrary cutoffs
One-Sample z Tests and t Tests • Since the P-value for the coin is about 7.7%, the result is not statistically significant • However, since the P-value is fairly close to 5%, it may be worth flipping the coin another 40 times and compiling the results to try another test
One-Sample z Tests and t Tests • The previous example illustrates a one-sample z test • We only had one sample and wanted to compare it against a chance model • Since the variable was approximately normal, we used a z score to find the P-value
One-Sample z Tests and t Tests • We were comparing the null hypothesis of flipping a fair coin to the alternative that the coin was biased in favor of heads • We were only looking at the right tail of the curve (one-tailed) • We could also compare against the coin being biased in either direction • We would then look at both tails (two-tailed)
One-Sample z Tests and t Tests • To perform a one-sample z test on the result of an experiment… • State the null hypothesis and an alternative hypothesis • Compute the expected value and standard error for the result using the model from the null hypothesis • Use a normal approximation to find the P-value • If the P-value is less than 5%, the result is significant enough to reject the null hypothesis
One-Sample z Tests and t Tests • In many cities, chlorine is added to drinking water to remove microbes • A typical recommended concentration is 3ppm (parts per million) • A reservoir technician wants to determine if the chlorine concentration is low enough to warrant adding more to the water • She takes 50 samples from the reservoir outlet and measures the concentration • She finds that the average concentration is 2.6ppm with an SD of 0.9ppm • State the null hypothesis and the alternate hypothesis, and find the P-value for the observation • Should the technician restock the reservoir? H0: The reservoir already has enough chlorine in it. HA: The reservoir needs more chlorine. P 0.1%, the reservoir should be restocked.
One-Sample z Tests and t Tests • Sometimes our sample is too small to justify a normal approximation • An engineer working for a steel manufacturer wants to determine the strength of the steel beams the company produces • He places 10 steel bars in a machine and pulls them until they deform • If the sample is too small, you cannot use a z test to check the result!
One-Sample z Tests and t Tests • Instead, we use a t test • A t distribution with m degrees of freedom is used in place of the normal curve • The degrees of freedom will be the number of measurements – 1 in this context • The distribution can be found on page A-106 of the text
One-Sample z Tests and t Tests • Since the sample is small, we have to adjust the SD of the measurements to reflect the true standard error • After adjusting the SD, calculate the SE in the usual way
One-Sample z Tests and t Tests • Once the EV and SE are calculated, standardize the observed result to get the t score • Look this value up in the t table • Find what range of t scores your t score would be in to estimate the P-value
One-Sample z Tests and t Tests • An engineer working for a steel manufacturer wants to determine the strength of the steel beams the company produces • The type of steel he is checking is rated to have a tensile strength of 7,525psi (pounds per square inch) • He places 10 steel bars in a machine and pulls them until they deform • The average tension of deformity was 7,486psi with an SD of 47psi • Make a t test to determine if the steel is up to specifications P 1.7%, the steel is inferior
One-Sample z Tests and t Tests • An engineer working on an aerial launch guided missile is testing the missile’s accuracy • His goal is for the missile to strike within 10m of its target • Because each missile costs $2 million, the engineer only gets to test-fire five missiles • The missiles strike at distances 9.2m, 10.4m, 11.7m, 9.6m, and 10.2m • Are the missiles ready for mass production? P 63%, the missiles are good enough
Two-Sample z Tests • Sometimes we are interested in comparing two averages against each other • If the chance model predicts the averages to be the same, their difference should be 0 • This is going to be the null hypothesis • We can run a z (or t) test on the difference of the two observed averages and compare it to the null hypothesis
Two-Sample z Tests • The expected difference between the two averages is just the difference between the expected averages • The null hypothesis predicts this to be 0 • The standard error for the difference is calculated as follows: • Where
Two-Sample z Tests • To perform a two-sample z test, standardize the observed difference according to the expected value and standard error for the difference to get the z score • Then look the z score up in the normal table to find the P-value
Two-Sample z Tests • You have two graders for this course • To ensure they are grading consistently, I compare the averages from each of their groups • On HW 5, one group of 38 students had an average of 47.7 with an SD of 13.8 • On HW 5, the other group of 39 students had an average of 46.0 with an SD of 14.8 • Make a z test and determine if there was a significant difference between the two groups P 62%, the graders were consistent
Two-Sample z Tests • Another useful property of two-sample z tests is that they can be used to determine the significance of the difference between treatment and control groups in studies • This is how we know if the studies from Chapter 1 and 2 show statistically significant results • Compare the treatment group’s average/percentage to that of the control group
Two-Sample z Tests • (Hypothetical) One high school does a study to see if there is a link between playing music and better grades in high school • The administrators compare the GPA’s (for that year) of students who were enrolled in a music course (such as band, choir, etc.) with those of students not enrolled in any such class • Make a two-sample z test • Do music students really have higher GPA’s? z = 18.756, P 0%. Music students do have higher GPA’s
Chi-Squared Tests • Sometimes we need to compare the sample distribution to the predicted distribution • A gambler observes throws of a die to determine if the die is fair • After observing 48 throws, he has the following observations • Is the die fair?
Chi-Squared Tests • In this case, we are comparing the observations against the null hypothesis that each outcome is equally likely • In 48 throws, we would expect to see 8 of each value come up
Chi-Squared Tests • To get an idea of how far off each observed frequency (OF) is from the expected frequency (EF), we calculate the following for each possible value in the distribution • By summing up these values, we obtain the 2 (chi-square) statistic • For the die example,
Chi-Squared Tests • Once we have the 2 statistic, we can look it up in a 2 table with m degrees of freedom • The degrees of freedom will be the number of values in the distribution – 1 in this context • The distribution can be found on page A-107 of the text
Chi-Squared Tests • Since there are six possible values in the die-rolling distribution, there are 6 – 1 = 5 degrees of freedom in the 2 distribution • Look up the 2 value of 9.75 in the row for 5 degrees of freedom • The table tells us that the P-value is between 5% and 10%, so the result is not statistically significant • However, the rolls came heavy on 4 and light on 3 • 3 and 4 are on opposite sides of the die • Perhaps it would be good to observe more throws and retest
Chi-Squared Tests • A programmer designing a random number generator needs to ensure that the numbers are uniformly distributed between 0 and 1 • “Uniformly distributed” means that each number between 0 and 1 is equally likely to be generated • She generates 1,000 numbers and groups them into class intervals based on their first digit after the decimal
Chi-Squared Tests • The results are shown in the table • Are the numbers close enough to uniform, or should the programmer adjust the generator? Use a 2 test with 9 degrees of freedom. 2 = 98.94, P 0%. The generator is certainly not uniform.
Chi-Squared Tests • Another use for the 2 test is to determine if two variables are independent • If two variables are independent, then the distribution of one variable under the other should look the same • The 2 test tells us if two distributions look the same
Chi-Squared Tests • To calculate the expected frequencies in a block, • Find the proportion of cases of all the variables in that row compared to the total number of cases in the table • Multiply this by the total number of cases in that column
(1156/4163)(760) 211 Chi-Squared Tests Total = 1156 Total = 3007 Total = 760 Total = 1613 Total = 1218 Total = 572 Grand total = 4163
(1156/4163)(1613) 448 Chi-Squared Tests Total = 1156 Total = 3007 Total = 760 Total = 1613 Total = 1218 Total = 572 Grand total = 4163
Chi-Squared Tests Total = 1156 Total = 3007 Total = 760 Total = 1613 Total = 1218 Total = 572 Grand total = 4163
Chi-Squared Tests • To find how far off each observed case is from the expected case, use the formula • To get the value of 2, add up all of these terms • In this case,
Chi-Squared Tests • The number of degrees of freedom will be (number of rows – 1)(number of columns – 1) • In this case, there are (4 – 1)(2 – 1) = 3 degrees of freedom • The last step is to estimate the P-value by finding the range in the table which covers 2 = 14.3 for 3 degrees of freedom • The table tells us P < 1%, meaning that we can say the variables are not independent
Summary • When we want to show that a result was not likely to occur by pure chance, we can use a hypothesis test to validate our claim • A hypothesis test takes as a null hypothesis some chance model which could describe the situation
Summary • The goal of the researcher is to reject the null hypothesis • This is accomplished by finding a P-value that is small enough to be considered “significant” • P-values less than 5% are generally considered statistically significant • The observed result was very unlikely to occur by pure chance
Summary • To compare a sample average or percentage against a chance model, use a z test (if sample is large enough) or a t test (if sample is small) • To compare the averages or percentages from two different samples, use a z (or t) test for the difference between the averages/percentages
Summary • To compare two entire distributions, use a 2 test • The null hypothesis for a 2 test is that the distributions being compared are the same • A 2 test can also be used to check if two variables are independent
Summary • Remember that all of the hypothesis tests can only give the researcher a probability that the observed value occurred under the null hypothesis conditions • Even if a score is significant, the test cannot sponsor an alternative • Proposing a viable alternative is a task for the researcher