760 likes | 1.05k Views
Statistics for Business (ENV). Chapter 9. INTRODUCTION TO HYPOTHESIS TESTING. Hypothesis Testing. 9.1 Null and Alternative Hypotheses and Errors in Testing 9.2 z Tests about a Population with known s 9.3 t Tests about a Population with unknown s. Hypothesis testing-1.
E N D
Statistics for Business(ENV) Chapter 9 INTRODUCTION TO HYPOTHESIS TESTING
Hypothesis Testing 9.1 Null and Alternative Hypotheses and Errors in Testing 9.2 z Tests about a Population with known s 9.3 t Tests about a Population with unknown s
Hypothesis testing-1 Researchers usually collect data from a sample and then use the sample data to help answer questions about the population. Hypothesis testing is an inferential statistical process that uses limited information from the sample data as to reach a general conclusion about the population.
Hypothesis testing-2 • A hypothesis test is a formalized procedure that follows a standard series of operations. • In this way, researchers have a standardized method for evaluating the results of their research studies.
The basic experimental situation for using hypothesis testing is presented here. It is assumed that the parameter is known for the population before treatment. The purpose of the experiment is to determine whether or not the treatment has an effect. Is the population mean after treatment the same as or different from the mean before treatment? A sample is selected from the treated population to help answer this question.
Procedures of hypothesis-testing 1.First, we state a hypothesis about a population. Usually the hypothesis concerns the value of a population parameter. For example, we might hypothesize that the mean IQ for UIC students is m = 110. 2.Next, we obtain a random sample from the population. For example, we might select a random sample of n = 100 UIC students. 3.Finally, we compare the sample data with the hypothesis. If the data are consistent with the hypothesis, we will conclude that the hypothesis is reasonable. But if there is a bigdiscrepancy between the data and the hypothesis, we will decide that the hypothesis is wrong.
Null and Alternative Hypotheses • The null hypothesis, denoted H0, is a statement of the basic proposition being tested. It generally represents the status quo (a statement of “no effect” or “no difference”, or a statement of equality) and is not rejected unless there is convincing sample evidence that it is false. • The (scientific or) alternative hypothesis, denoted Ha (or H1) , is an alternative (to the null hypothesis) statement that will be accepted only if there is convincing sample evidence that it is true. • These two hypotheses are mutually exclusive and exhaustive.
Z Alpha level of .05 -- the probability of rejecting the null hypothesis when it is true is no more than 5%.
The locations of the critical region boundaries for three different levels of significance
Example: Alcohol appears to be involved in a variety of birth defects, including low birth weight and retarded growth. A researcher would like to investigate the effect of prenatal alcohol on birth weight. A random sample of n = 16 pregnant rats is obtained. The mother rats are given daily doses of alcohol. At birth, one pup is selected from each litter to produce a sample of n = 16 newborn rats. The average weight for the sample is 15 grams. The researcher would like to compare the sample with the general population of rats. It is known that regular newborn rats (not exposed to alcohol) have an average weight of m = 18 grams. The distribution of weights is normal with sd = 4.
1. State the hypotheses The null hypothesis states that exposure to alcohol has no effect on birth weight. The alternative hypothesis states that alcohol exposure does affect birth weight. 2. Select the Level of Significance (alpha) level We will use an alpha level of .05. That is, we are taking a 5% risk of committing a Type I error, or, the probability of rejecting the null hypothesis when it is true is no more than 5%. 3. Set the decision criteria by locating the critical region
Alpha level of .05 -- the probability of rejecting the null hypothesis when it is true is no more than 5%. Z
4. COLLECT DATA and COMPUTE SAMPLE STATISTICS The sample mean is then converted to a z-score, which is our test statistic. 5. Arrive at a decision Reject the null hypothesis
Step 1: State the null and alternate hypotheses Null Hypothesis H0: A statement about the value of a population parameter ( and ). With “=” sign Say, “ = 2” or “ 2” Alternative Hypothesis H1: A statement that is accepted if H0 is false Without “=” sign Say, “ 2” or “ < 2”
3 hypotheses about means Step 1: State the null and alternate hypotheses H0:m = m0 H1:m = m0 a constant Three possibilities regarding means The null hypothesis always contains equality. H0:m<m0 H1:m > m0 H0:m>m0 H1:m < m0
Step Two: Select a Level of Significance, Level of Significance, Measures the max probability of rejecting a true null hypothesis too high Type II Error Type I Error H0 is false but you accept it (false negative). H0is actually truebut you reject it (false positive). Level of Significance: the maximum allowable probability of making a type I error
Risk table Step Two: Select a Level of Significance,
Step 3: Select the test statistic A test statistic is used to determine whether the result of the research study (the difference between the sample mean and the population mean) is more than would be expected by chance alone. We will only consider statistics Z or t, for the time being. Since our hypothesis is about the population mean.
Test Statistic • The term test statistic simply indicates that the sample mean is converted into a single, specific statistic that is used to test the hypotheses. • The z-score statistic that is used in the hypothesis test is the first specific example of what is called a test statistic. • We will introduce several other test statistics that are used in a variety of different research situations later.
Step 4: Formulate the decision rule. Reject the H0 if Decision Rule Determined by level of significance Computed z> Critical z H0: 0 H0: 0 Computed z < - Critical z Computed z > Critical z Or Computed z < - Critical z H0: = 0
Critical value:The dividing point between the region where H0 is rejected and the region where H0 is accepted, determined by level of significance. From the table, with statistic z, one tailed test and significance level 0.05, we found the critical value 1.65. H0: 0 Reject if z > Critical z
One-Tailed Test of Significance . If H0: 0 is true, it is very unlikely that the computed z value is so large.
Reject the H0 if H0: 0 Computed z < - Critical z If H0: 0 is true, it is very unlikely that the computed z value (from the sample mean) is so small.
Two-Tailed Tests of Significance If H0: = 0 is true, it is very unlikely that the computed z value is extremely large or small.
Step 5: Make a decision. Reject ! Accept !
Example One Tailed (Upper Tailed) • An insurance company is reviewing its current policy rates. When originally setting the rates they believed that the average claim amount was $1,800. They are concerned that the true mean is actually higher than this, because they could potentially lose a lot of money. They randomly select 40 claims, and calculate a sample mean of $1,950. Assuming that the population standard deviation of claims is $500, and set level of significance = 0.05, test to see if the insurance company should be concerned. Step 1: Set the null and alternative hypotheses
Example One Tailed (Upper Tailed) Step 2: Calculate the test statistic Step 3: Set Rejection Region Looking at the picture below, we need to put all of alpha in the right tail. Thus, R : Z > 1.96
Example One Tailed (Upper Tailed) Step 4: Conclude We can see that z=1.897 < 1.96, thus our test statistic is not in the rejection region. Therefore we fail to reject the null hypothesis. We cannot conclude anything statistically significant from this test, and cannot tell the insurance company whether or not they should be concerned about their current policies.
Example: One Tailed (Lower Tailed) Trying to encourage people to stop driving to campus, the university claims that on average it takes people 30 minutes to find a parking space on campus. John does not think it takes so long to find a spot. He calculated the mean time to find a parking space on campus for the last five times and found it to be 20 minutes. Assuming that the time it takes to find a parking spot is normally distributed, and that the population standard deviation = 6 minutes, perform a hypothesis test with level of significance alpha = 0.10 to see if his claim is correct.
Example: One Tailed (Lower Tailed) Step 1: Set the null and alternative hypotheses Step 2: Calculate the test statistic Step 3: Set Rejection Region Looking at the picture below, we need to put all of alpha in the left tail. Thus, R : Z < -1.28
Example: One Tailed (Lower Tailed) Step 4: Conclude We can see that z=-3.727 < -1.28, thus our test statistic is in the rejection region. Therefore we reject the null hypothesis in favor of the alternative. We conclude that the mean is significantly less than 30, thus John has proven that the mean time to find a parking space is less than 30.
Example: Two Tailed A sample of 40 sales receipts from a grocery store has mean = $137 and population standard deviation= $30.2. Use these values to test whether or not the mean in sales at the grocery store are different from $150 with level of significance alpha = 0.01. Step 1: Set the null and alternative hypotheses Step 2: Calculate the test statistic
Example: Two Tailed Step 3: Set Rejection Region Looking at the picture below, we need to put half of alpha in the left tail, and the other half of alpha in the right tail. Thus, R : Z < -2.58 or Z > 2.58 Step 4: Conclude We see that Z= -2.722 < -2.58, thus our test statistic is in the rejection region. Therefore we reject the null hypothesis in favor of the alternative. We can conclude that the mean is significantly different from $150, thus I have proven that the mean sales at the grocery store is not $150.
Example: credit manager Lisa, the credit manager, wants to check if the mean monthly unpaid balance is more than $400. The level of significance she set is .05. A random check of 172 unpaid balances revealed the sample mean to be $407. The population standard deviation is known to be $38. Should Lisa conclude that the population mean is greater than $400, or is it reasonable to assume that the difference of $7 ($407-$400) is due to chance? (at confidence level 0.05)
Step 4 H0 is rejected if z > 1.65 (since = 0.05) Step 5 Make a decision and interpret the results. (Next page) Step 3 Since is known, we can find the test statistic z. Step 1 H0: µ < $400 H1: µ > $400 Step 2 The significance level is .05. Example: Lisa, the credit manager
Step 5 Make a decision and interpret the results. The p-value is .0078 for a one-tailed test. (ref to informal ans.) • Computed z of 2.42 > Critical zof 1.65, • p of .0078 < a of .05. Reject H0. We can conclude that the mean unpaid balance is greater than $400.
Limitation of z-scores in hypothesis testing • The limitation of z-scores in hypothesis testing is that the population standard deviation (or variance) must be known. • What if you don’t know the µ and of the population? • Answer: use the sample variability instead
Sample variance s2 = sum of squares of deviation/ (n-1) = sum of square of deviations/df = SS/df Since you must know the sample mean before you can compute sample variance, this places a restriction on sample variability such that only n-1 scores in a sample are free to vary. The value n-1 is called the degrees of freedom (or df ) for the sample variance.
Z statistic t statistic Unknown If you select all the possible samples of a particular size (n), the set of all possible t statistics will form a t distribution. Good for: (i) large sample n>30, with the underlying distribution may or may not be Normal (ii) small sample n<30 with the underlying distribution is Normal
Distributions of the t statistic for different values of degrees of freedom are compared to a normal distribution.
The t distribution with df = 3. Note that 5% of the distribution is located in the tails t>2.353 and t<2.353.
The label on Fries’ Catsup indicates that the bottle contains 16 ounces of catsup. A sample of 36 bottles from last hour’s production revealed a mean weight of 16.12 ounces per bottle and a sample standard deviation of 0.5 ounces. At the 0.05 significance level, test if the process out of control? That is, can we conclude that the mean amount per bottle is different from 16 ounces?
Step 5 Make a decision and interpret the results. (Next page) Step 4 State the decision rule. Reject H0 if z > 1.96 or z < -1.96 (since = 0.05) Step 1 State the null and the alternative hypotheses H0: m = 16 H1: m = 16 Step 3 Since the sample size is large enough and the population s.d. is unknown, we can use the test statistic is t. Step 2 Select the significance level. The significance level is .05.
Step 5:Make a decision and interpret the results. The p-value is .1499 for a two-tailed test. • Computed z of 1.44 < Critical zof 1.96, • p of .1499 > a of .05, Do not reject the null hypothesis. We cannot conclude the mean is different from 16 ounces.