Introduction to Statistical Inferences: Estimation and Hypothesis Testing

Chapter 8 ~ Introduction to Statistical Inferences s = z(a/2) E n

Chapter Goals • Learn the basic concepts of estimation and hypothesis testing • Consider questions about a population mean using two methods that assume the population standard deviation is known • Consider: what value or interval of values can we use to estimate a population mean? • Consider: is there evidence to suggest the hypothesized mean is incorrect?

8.1 ~ The Nature of Estimation • Discuss estimation more precisely • What makes a statistic good ? • Assume the population standard deviation, s, is known throughout this chapter • Concentrate on learning the procedures for making statistical inferences about a population mean m

Point Estimate for a Parameter • Example: is a point estimate (single number value) for the mean m of the sampled population = x 14 . 7 Point Estimate for a Parameter:The value of the corresponding statistic How good is the point estimate? Is it high? Or low? Would another sample yield the same result? Note: The quality of an estimation procedure is enhanced if the sample statistic is both less variable and unbiased

Unbiased Statistic Unbiased Statistic: A sample statistic whose sampling distribution has a mean value equal to the value of the population parameter being estimated. A statistic that is not unbiased is a biased statistic. • Example: The figures on the next slide illustrate the concept of being unbiased and the effect of variability on a point estimate Assume A is the parameter being estimated

Illustrations Negative bias Under estimate High variability Unbiased On target estimate Positive bias Over estimate Low variability

Notes 1. The sample mean, ,is an unbiased statistic because the mean value of the sampling distribution is equal to the population mean: x m = m x 2. Sample means vary from sample to sample. We don’t expect the sample mean to be exactly equal the population mean m. 3. We do expect the sample mean to be close to the population mean 4. Since closeness is measured in standard deviations, we expect the sample mean to be within 2 standard deviations of the population mean

Important Definitions Interval Estimate: An interval bounded by two values and used to estimate the value of a population parameter. The values that bound this interval are statistics calculated from the sample that is being used as the basis for the estimation. Level of Confidence 1 - a: The probability that the sample to be selected yields an interval that includes the parameter being estimated Confidence Interval: An interval estimate with a specified level of confidence

Summary x • Use the point estimate as the central value of an interval • Since the sample mean ought to be within 2 standard deviations of the population mean (95% of the time), we can find the bounds to an interval centered at : x - s + s x 2 ( ) to x 2 ( ) x x • To construct a confidence interval for a population mean m, use the CLT • The level of confidence for the resulting interval is approximately 95%, or 0.95 • We can be more accurate in determining the level of confidence

Illustration Distribution of x m x • The interval is an approximate 95% confidence interval for the population mean m based on this x - s + s x 2 to x 2 x x

8.2 ~ Estimation of Mean m (s Known) • Formalize the interval estimation process as it applies to estimating the population mean m based on a random sample • Assume the population standard deviation s is known • The assumptions are the conditions that need to exist in order to correctly apply a statistical procedure

The Assumption... The assumption for estimating the mean m using a known s : The sampling distribution of has a normal distribution x Assumption satisfied by: 1. Knowing that the sampled population is normally distributed, or 2. Using a large enough random sample (CLT) Note:The CLT may be applied to smaller samples (for examplen = 15) when there is evidence to suggest a unimodal distribution that is approximately symmetric. If there is evidence of skewness, the sample size needs to be much larger.

The 1-a Confidence Interval of m s s z(a/2) z(a/2) - + x x to n n Notes: 1. is the point estimate and the center point of the confidence interval x - z(a/2) z(a/2) • A 1-a confidence interval for m is found by 2. z(a/2): confidence coefficient, the number of multiples of the standard error needed to construct an interval estimate of the correct width to have a level of confidence 1-a

Notes Continued s / n x 4. : maximum error of estimate E One-half the width of the confidence interval (the product of the confidence coefficient and the standard error) s z(a/2) ( / n ) 5. : lower confidence limit (LCL) : upper confidence limit (UCL) - s z(a/2) x ( / n ) + s z(a/2) x ( / n ) 3. : standard error of the mean The standard deviation of the distribution of

The Confidence Interval A Five-Step Model: 1. Describe the population parameter of concern 2. Specify the confidence interval criteria a. Check the assumptions b. Identify the probability distribution and the formula to be used c. Determine the level of confidence, 1 - a 3. Collect and present sample information 4. Determine the confidence interval a. Determine the confidence coefficient b. Find the maximum error of estimate c. Find the lower and upper confidence limits 5. State the confidence interval

Example 2. Specify the confidence interval criteria a. Check the assumptions The weights are normally distributed, the distribution of is normal b. Identify the probability distribution and formula to be used Use the standard normal variable z with s = 0.27 c. Determine the level of confidence, 1 - a The question asks for 95% confidence: 1 - a = 0.95 x • Example: The weights of full boxes of a certain kind of cereal are normally distributed with a standard deviation of 0.27 oz. A sample of 18 randomly selected boxes produced a mean weight of 9.87 oz. Find a 95% confidence interval for the true mean weight of a box of this cereal. Solution: 1. Describe the population parameter of concernThe mean, m, weight of all boxes of this cereal

Solution Continued = = n 18 ; x 9 . 87 4. Determine the confidence interval a. Determine the confidence coefficient The confidence coefficient is found using Table 4B: z(a/2) 1.15 1.28 1.65 1.96 2.33 2.58 1-  0.75 0.80 0.90 0.95 0.98 0.99 3. Collect and present informationThe sample information is given in the statement of the problem Given:

Solution Continued s 0 . 27 =z(a/2) = = E 1 . 96 0 . 1247 n 18 c. Find the lower and upper confidence limits Use the sample mean and the maximum error: s s -z(a/2) + z(a/2) to x x n n - + 9 . 87 0 . 1247 to 9 . 87 0 . 1247 9 . 7453 to 9 . 9947 9 . 75 to 10 . 00 b. Find the maximum error of estimate Use the maximum error part of the formula for a CI 5. State the confidence interval 9.75 to 10.00 is a 95% confidence interval for the true mean weight, , ofcereal boxes

Example • Example: A random sample of the test scores of 100 applicants for clerk-typist positions at a large insurance company showed a mean score of 72.6. Determine a 99% confidence interval for the mean score of all applicants at the insurance company. Assume the standard deviation of test scores is 10.5. Solution: 1. Parameter of concernThe mean test score, m, of all applicants at the insurance company 2. Confidence interval criteria a. Assumptions: The distribution of the variable, test score, is not known. However, the sample size is large enough (n = 100) so that the CLT applies b. Probability distribution: standard normal variable z with  = 10.5 c. The level of confidence: 99%, or 1 -  = 0.99

Solution Continued 4. The confidence intervala. Confidence coefficient: b. Maximum error: c. The lower and upper limits: = = z(a/2) z(0.005) 2 . 58 = s = = E ( / n ) ( 2 . 58 )( 10 . 5 / 100 ) 2 . 709 z(a/2) 3. Sample informationGiven: n = 100 and = 72.6 x 5. Confidence intervalWith 99% confidence we say, “The mean test score is between 69.9 and 75.3”, or “69.9 to 75.3 is a 99% confidence interval for the true mean test score” Note: The confidence is in the process. 99% confidence means: if we conduct the experiment over and over, and construct lots of confidence intervals, then 99% of the confidence intervals will contain the true mean value m.

Sample Size s = E z(a/2) n Solve this expression for n: 2 z(a/2) × s é ù = n ê ú ë û E • Problem: Find the sample size necessary in order to obtain a specified maximum error and level of confidence (assume the standard deviation is known)

Example Solution: Therefore, n = 591 2 z(a/2) × s é ù n = ê ú E ë û 2 ( 1 . 96 )( 6 . 2 ) é ù 2 n = = = [24 . 304] 590 . 684 ê ú ë û 0. 5 • Example: Find the sample size necessary to estimate a population mean to within 0.5 with 95% confidence if the standard deviation is 6.2 Note: When solving for sample size n, always round up to the next largest integer (Why?)

8.3 ~ The Nature of Hypothesis Testing • Formal process for making an inference • Consider many of the concepts of a hypothesis test and look at several decision-making situations • The entire process starts by identifying something of concern and then formulating two hypotheses about it

Hypothesis Hypothesis: A statement that something is true Statistical Hypothesis Test: A process by which a decision is made between two opposing hypotheses. The two opposing hypotheses are formulated so that each hypothesis is the negation of the other. (That way one of them is always true, and the other one is always false). Then one hypothesis is tested in hopes that it can be shown to be a very improbable occurrence thereby implying the other hypothesis is the likely truth.

Null & Alternative Hypothesis There are two hypotheses involved in making a decision: Null Hypothesis, Ho: The hypothesis to be tested. Assumed to be true. Usually a statement that a population parameter has a specific value. The “starting point” for the investigation. Alternative Hypothesis, Ha: A statement about the same population parameter that is used in the null hypothesis. Generally this is a statement that specifies the population parameter has a value different, in some way, from the value given in the null hypothesis. The rejection of the null hypothesis will imply the likely truth of this alternative hypothesis.

Notes 1. Basic idea: proof by contradiction Assume the null hypothesis is true and look for evidence to suggest that it is false 2. Null hypothesis: the status quo A statement about a population parameter that is assumed to be true 3. Alternative hypothesis: also called the research hypothesis Generally, what you are trying to prove? We hope experimental evidence will suggest the alternative hypothesis is true by showing the unlikeliness of the truth of the null hypothesis

Example • Example: Suppose you are investigating the effects of a new pain reliever. You hope the new drug relieves minor muscle aches and pains longer than the leading pain reliever. State the null and alternative hypotheses. Solutions: • Ho: The new pain reliever is no better than the leading pain reliever • Ha: The new pain reliever lasts longer than the leading pain reliever

Example • Example: You are investigating the presence of radon in homes being built in a new development. If the mean level of radon is greater than 4 then send a warning to all home owners in the development. State the null and alternative hypotheses. Solutions: • Ho: The mean level of radon for homes in the development is 4 (or less) • Ha: The mean level of radon for homes in the development is greater than 4

Hypothesis Test Outcomes Null Hypothesis Decision True False Fail to reject Type A correct decision Type II error H o Reject H Type I error Type B correct decision o Type A correct decision: Null hypothesis true, decide in its favor Type B correct decision: Null hypothesis false, decide in favor of alternative hypothesis Type I error: Null hypothesis true, decide in favor of alternative hypothesis Type II error: Null hypothesis false, decide in favor of null hypothesis

Example • Example: A calculator company has just received a large shipment of parts used to make the screens on graphing calculators. They consider the shipment acceptable if the proportion of defective parts is 0.01 (or less). If the proportion of defective parts is greater than 0.01 the shipment is unacceptable and returned to the manufacturer. State the null and alternative hypotheses, and describe the four possible outcomes and the resulting actions that would occur for this test. Solutions: • Ho: The proportion of defective parts is 0.01 (or less) • Ha: The proportion of defective parts is greater than 0.01

Null Hypothesis Is True: Type A correct decision Truth of situation: The proportion of defective parts is 0.01 (or less) Conclusion: It was determined that the proportion of defective parts is 0.01 (or less) Action: The calculator company received parts with an acceptable proportion of defectives Fail To Reject Ho Null Hypothesis Is False: Type II error Truth of situation: The proportion of defective parts is greater than 0.01 Conclusion: It was determined that the proportion of defective parts is 0.01 (or less) Action: The calculator company received parts with an unacceptable proportion of defectives

Null hypothesis is true: Type I error Truth of situation: The proportion of defectives is 0.01 (or less) Conclusion: It was determined that the proportion of defectives is greater than 0.01 Action: Send the shipment back to the manufacturer. The proportion of defectives is acceptable Reject Ho Null hypothesis is false: Type B correct decision Truth of situation: The proportion of defectives is greater than 0.01 Conclusion: It was determined that the proportion of defectives is greater than 0.01 Action: Send the shipment back to the manufacturer. The proportion of defectives is unacceptable

Errors Error in Decision Type Probability a Rejection of a true I H o b Failure to reject a false II H o Correct Decision Type Probability a Failure to reject a true A 1 - H o b Rejection of a false B 1 - H o Notes: 1. The type II error sometimes results in what represents a lost opportunity 2. Since we make a decision based on a sample, there is always the chance of making an error Probability of a type I error = a Probability of a type II error = b

Notes 1. Would like a and b to be as small as possible 2. a and b are inversely related 3. Usually set a (and don’t worry too much about b. Why?) 4. Most common values for a and b are 0.01 and 0.05 5. 1 - b : the power of the statistical test A measure of the ability of a hypothesis test to reject a false null hypothesis 6. Regardless of the outcome of a hypothesis test, we never really know for sure if we have made the correct decision

Interrelationship P (type I error) a P (type II error) b Interrelationship between the probability of a type I error (a), the probability of a type II error (b), and the sample size (n)

Level of Significance & Test Statistic Level of Significance, a: The probability of committing thetype I error Test Statistic: A random variable whose value is calculated from the sample data and is used in making the decision fail to reject Ho or reject Ho Notes: • The value of the test statistic is used in conjunction with a decision rule to determine fail to reject Ho or reject Ho • The decision rule is established prior to collecting the data and specifies how you will reach the decision

The Conclusion a. If the decision is reject Ho, then the conclusion should be worded something like, “There is sufficient evidence at the a level of significance to show that . . . (the meaning of the alternative hypothesis)” b. If the decision isfail to reject Ho, then the conclusion should be worded something like, “There is not sufficient evidence at the a level of significance to show that . . . (the meaning of the alternative hypothesis)” Notes: • The decision is about Ho • The conclusion is a statement about Ha • There is always the chance of making an error

8.4 ~ Hypothesis Test of Mean ( known): A Probability-Value Approach • The concepts and much of the reasoning behind hypothesis tests are given in the previous sections • Formalize the hypothesis test procedure as it applies to statements concerning the mean  of a population ( known): a probability-value approach

The Assumption... The assumption for hypothesis tests about a mean m using a known s: The sampling distribution of has a normal distribution x Recall: 1. The distribution of has mean m 2. The distribution of has standard deviation x s x n Hypothesis test: 1. A well-organized, step-by-step procedure used to make a decision 2. Probability-value approach (p-value approach): a procedure that has gained popularity in recent years. Organized into five steps.

The Probability-Value Hypothesis Test A Five-Step Procedure: 1. The Set-Up a. Describe the population parameter of concern b. State the null hypothesis (Ho) and the alternative hypothesis (Ha) 2. The Hypothesis Test Criteria a. Check the assumptions b. Identify the probability distribution and the test statistic formula to be used c. Determine the level of significance, a 3. The Sample Evidence a. Collect the sample information b. Calculate the value of the test statistic 4. The Probability Distribution a. Calculate the p-value for the test statistic b. Determine whether or not the p-value is smaller than a 5. The Results a. State the decision about Ho b. State a conclusion about Ha

Example • Example: A company advertises the net weight of its cereal is 24 ounces. A consumer group suspects the boxes are underfilled. They cannot check every box of cereal, so a sample of cereal boxes will be examined. A decision will be made about the true mean weight based on the sample mean. State the consumer group’s null and alternative hypotheses. Assume s = 0.2 Solution: 1. The Set-Up a. Describe the population parameter of concern The population parameter of interest is the mean , the mean weight of the cereal boxes

Solution Continued b. State the null hypothesis (Ho) and the alternative hypothesis (Ha) Formulate two opposing statements concerning m Ho: m = 24 ( ) (the mean is at least 24) Ha: m < 24 (the mean is less than 24) Note: The trichotomy law from algebra states that two numerical values must be related in exactly one of three possible relationships: <, =, or >. All three of these possibilities must be accounted for between the two opposing hypotheses in order for the hypotheses to be negations of each other.

Possible Statements of Null & Alternative Hypotheses Notes: • The null hypothesis will be written with just the equal sign (a value is assigned) • When equal is paired with less than or greater than, the combined symbol is written beside the null hypothesis as a reminder that all three signs have been accounted for in these two opposing statements.

Examples • Example: A freezer is set to cool food to . If the temperature is higher, the food could spoil, and if the temperature is lower, the freezer is wasting energy. Random freezers are selected and tested as they come off the assembly line. The assembly line is stopped if there is any evidence to suggest improper cooling. State the null and alternative hypotheses. • Example: An automobile manufacturer claims a new model gets at least 27 miles per gallon. A consumer groups disputes this claim and would like to show the mean miles per gallon is lower. State the null and alternative hypotheses. Solution: Ho: m = 27 (³) and Ha: m < 27 Solution: Ho: m = 10 and Ha: m¹ 10

Common Phrases & Their Negations ³ H : ( ) o at least less than no less than less than not less than less than £ H : ( ) o at most more than no more than more than not greater than greater than = H : ( ) o is is not not different from different from same as not same as

2. The Hypothesis Test Criteria a. Check the assumptions The weight of cereal boxes is probably mound shaped. A sample size of 40 should be sufficient for the CLT to apply. The sampling distribution of the sample mean can be expected to be normal. Example Continued b. Identify the probability distribution and the test statistic to be used To test the null hypothesis, ask how many standard deviations away from m is the sample mean - m x = test statistic : z * s n • Example Continued: Weight of cereal boxes Recall: Ho: m = 24 (³) (at least 24) Ha: m < 24 (less than 24)

Solution Continued 3. The Sample Evidence a. Collect the sample information A random sample of 40 cereal boxes is examined = = x 23 . 95 and n 40 b. Calculate the value of the test statistic (s = 0.2) - m - x 23 . 95 24 = = = - z * 1 . 5811 s n 0. 2 40 c. Determine the level of significance Let a = 0.05 4. The Probability Distribution a. Calculate the p-value for the test statistic

Probability-Value or p-Value P = < = < - = > P P ( z z *) P ( z 1 . 58 ) P ( z 1 . 58 ) = - = 0.5000 0 . 4429 0 . 0571 Probability-Value, or p-Value: The probability that the test statistic could be the value it is or a more extreme value (in the direction of the alternative hypothesis) when the null hypothesis is true (Note: the symbol P will be used to represent the p-value, especially in algebraic situations)

Solution Continued 5. The Results Decision Rule: a. If the p-value is less than or equal to the level of significance a, then the decision must be to reject Ho b. If the p-value is greater than the level of significance a, then the decision must be to fail to reject Ho b. Determine whether or not the p-value is smaller than a The p-value (0.0571) is greater than a (0.05) a. State the decision about HoDecision about Ho: Fail to reject Ho b. Write a conclusion about Ha There is not sufficient evidence at the 0.05 level of significance to show that the mean weight of cereal boxes is less than 24 ounces

Notes • If we fail to reject Ho, there is no evidence to suggest the null hypothesis is false. This does not mean Ho is true. • The p-value is the area, under the curve of the probability distribution for the test statistic, that is more extreme than the calculated value of the test statistic. • There are 3 separate cases for p-values. The direction (or sign) of the alternative hypothesis is the key.

Introduction to Statistical Inferences: Estimation and Hypothesis Testing

Introduction to Statistical Inferences: Estimation and Hypothesis Testing

Presentation Transcript