520 likes | 675 Views
Quant 3610. Weber State University. Dr. Stephen Hays Russell. Reminder on Grade Cuts. 92.01 - 100 = A 90 - 92.0 = A- 87 - 89.99 = B+ 83 - 86.99 = B 77 - 82.99 = B- 74 - 76.99 = C+ 70 - 73.99 = C 66 - 69.99 = C- 64 - 65.99 = D+ 61 - 63.99 = D 57 - 60.99 = D- Below 57.0 = E.
E N D
Quant 3610 Weber State University Dr. Stephen Hays Russell
Reminder on Grade Cuts • 92.01 - 100 = A • 90 - 92.0 = A- • 87 - 89.99 = B+ • 83 - 86.99 = B • 77 - 82.99 = B- • 74 - 76.99 = C+ • 70 - 73.99 = C • 66 - 69.99 = C- • 64 - 65.99 = D+ • 61 - 63.99 = D • 57 - 60.99 = D- • Below 57.0 = E
Part II Estimating population means and proportions Logical estimators: X for s for P for
Making Point Estimates vs.Interval Estimates • Point Estimates (single trial) • You’ll get a different estimate each time! • No probability content! • The power of increasing sample size is not explicit • Interval estimates • Explicitly state a probability • Provide a range of values by wrapping a margin of error around the point estimate (of the unknown population parameter)
Example: Estimating • Point Estimate: • “From a random sample of ‘n’, my estimate of based upon X is 1260.” Logical questions: How sure are you? How far off might this be? • Interval Estimate: • “Based upon a random sample of ‘n’ I am 95% confident that lies between 1242 and 1278.”
Margin of error • A Confidence Interval is . . . Point Estimate + a margin of error Example: 1260 + 18 = 1242 to 1278 Note: The “margin of error” is always one-half the interval.
Constructing Confidence Intervals • Built on X • X must be normally distributed • C.I. = X + Z . S.E. X + Z .X • Common “Z’s” for Confidence Intervals: • 90% 1.645 • 95% 1.96 • 99% 2.575
Example Suppose a normally distributed population has a mean of 40 and a standard deviation of 46.8. What is an 80% CI if I take a sample of 100 and get an X of • 35? 35 + and – “1.2817 . 4.68” = 29 to 41 • 42? 42 + and – “1.2817 . 4.68” = 36 to 48 • 49? 49 + and – “1.2817 . 4.68” = 43 to 55
The Nature of Interval Estimates • Interval width is a function of three things • The level of confidence • A 95% C.I. (with a “Z” of 1.96) is wider than A 90% C.I. (with a “Z” of 1.645) Note: a smaller z value produces a narrower interval and a more precise estimate, but also implies a smaller degree of confidence in the estimate • The inherent variability in the population () • Samplesize
C.I. Estimates of a Population Mean LARGE SAMPLES (n > 30) Use X + Z.X if is known Use X + Z . SX if is unknown Example Problem: An economist surveyed 48 heating oil dealers in Connecticut and found an average cost of heating oil per gallon to be 97.80 cents, with a standard deviation of 4.212 cents. Estimate with 90% confidence the mean cost of a gallon of heating oil in this region. Answer: 97.8 + 1.645(4.212/6.9282) = 97.8 + 1.00
Do it again with a higher level of confidence: • An economist surveyed 48 heating oil dealers in Connecticut and found an average cost of heating oil per gallon to be 97.80 cents, with a standard deviation of 4.212 cents. Estimate with 99% confidence the mean cost of a gallon of heating oil in this region. Answer: 97.80 + 2.575(4.212/6.9282) = 97.8 + 1.565 We are 90% confident the mean cost is between 96.8 and 98.8 cents. We are 99% confident the mean cost is between 96.235 and 99.365 cents. Is the TRUE MEAN 96.00 cents?
Is the true mean 96.0 cents? • Try a CDF command: • MTB> cdf 97.80; • SUBC> Normal 96, .60795. • ANSWER: .9985 If the true mean is 96.0 cents, the probability of getting the sample result we did (97.80) in a sample size of 48 is fifteen in ten thousand.
Another example problem: H&R Block Company wants to set a new base fee reflecting the average time required to complete a standard federal income tax return. The company randomly sampled 36 standard returns from its recent work and found the returns required an average of 57.8 minutes. The standard deviation of the sample was 10.6 minutes. Estimate with 98 percent confidence the actual mean time required by this firm to complete a standard return. Answer: 57.8 + 2.3263(10.6/6) = 57.8 + 4.1 What could the firm do to tighten this interval without decreasing its level of confidence?
Required Sample Size • When estimating a population mean, a maximum margin of error may be imposed upon you. • Consider: Your firm wants you to estimate with 95% confidence the average selling price of its Z-200 motorcycle at retail dealerships. It wants the margin of error to be no greater than $100. The standard deviation in selling price is known to be $120. How many dealerships should be surveyed?
Required Sample Size • When estimating a population mean, a maximum margin of error may be imposed upon you. • Consider: Your firm wants you to estimate with 95% confidence the average selling price of its Z-200 motorcycle at retail dealerships. It wants the margin of error to be no greater than $100. The standard deviation is selling price is known to be $250. How many dealerships should be surveyed? C.I. = X + Z . What is the margin of error in this equation?
Plug in values and solve for “n” Imposed margin of error = 100 So, 100 = 1.96 x 250/
Plug in values and solve for “n” Imposed margin of error = 100 So, 100 = 1.96 x 250/ n = 24.01 (or 25) Note: If is unknown, use a preliminary sample and compute “s” as an surrogate for or Do
Try this problem: • IHC needs to estimate the average net earnings of orthopedic surgeons nationwide with 99% confidence and with a margin of error no greater than $10,000. Their judgment is that the range of salaries would be from a low of $100,000 to a high of $650,000. How large of a sample is required? Answer: 1,254 (H – L)/4 = 137,500 10,000 = 2.575 x 137,500/ What would required sample size reduce to if the confidence level were dropped to 95%? 727
Small Sample Estimates of • If sample size is less than 30, the CLT does not apply. For small samples, the parent population must be normally distributed. • We must use the t distribution instead of the Z distribution if is unknown.
The t distribution • t values are larger than comparable Z values • t values are based upon degrees of freedom • Degrees-of-freedom refers to the number of independent pieces of information (generally n – 1) • t values approach Z values as sample size increases • t Table
The t Distribution Table • The table shows the area under a specified curve, defined by a given number of degrees of freedom, that will lie to the right of a specified value of t.
Confidence Interval using t • Setting: We seek a C.I. For the population mean, we have a sample size of less than 30, we don’t know and we believe the population is normally distributed. Confidence Interval estimate is • Note: The Standard error must be multiplied by the Finite Population Correction Factor if n≥.05N.
Example problem An insurance company wants to estimate the average claim on its automobile policies. It believes sizes of claims are normally distributed. It uses the last 21 claims as a sample and finds their average to be $657, with a sample standard deviation of $310. Construct a 95% confidence interval for the average claim on all policies in the population.
Example Problem • Parent population IS normally distributed • Sample is small • Population appears to be large • We want a 95% C.I. • Facts: n = 21 X = $657 s = $310 = 657 + 2.086(310/4.583) $657 + 141 or $516 to $798 Question: Does this CI contain ?
When to use Z and when to use t • Large sample? Use Z • Small sample and is known? Use Z • Small sample and is unknown? Use t Note: Some statisticians ALWAYS use t when is unknown. Reminder: One cannot work small sample problems when the underlying distribution is not normal.
MINITAB for estimating means • The U.S. Department of Transportation seeks to estimate the national average price of regular unleaded gasoline as of January 20, 2003. They survey 60 randomly selected gas stations nationwide. Based upon these 60 data points, what is the point estimate, what is the sample standard deviation, what is a 95% interval estimate, what is the margin of error, and what is the probability that the national average for a gallon of gas on January 20 is greater than $1.51?
Answers: • Point Estimate: $1.4837 • Sample standard deviation: .1055 • 95% CI $1.457 to $1.5104 • Margin of error is $0.0267 • Probability that > 1.5104 is .025 Question: Is this analysis valid since we have no evidence that gasoline prices follow a normal distribution?
Estimates of the Difference Between Two Means • Example: What is the difference in mean tread life between Goodyear Steel-Belted Radial tires (population A) and Firestone Steel-Belted Radial tires (population B)? • The Parameter of interest is A - B • The unbiased estimator is XA – XB • In these types of problems we must differentiate between independent and dependent (or “paired” samples)
Independent versus Dependent Samples • Independent samples: Selection of one sample (from population A) is not related to the selection of the other sample (from population B). • Dependent samples: Each observation in Sample A is logically paired to an observation in Sample B—the observations have some common characteristic that tie them together. • For Independent samples, nAand nB need not be equal. For dependent samples, by design, nA has to equal nB.
Examples • I put Goodyear tires on 40 cars and Firestone tires on 40 cars and monitor tread life. Independent or dependent samples? • I put Goodyear tires on the left side of 36 cars and Firestone tires on the right side of these cars. Independent of dependent (paired) samples? • I survey the price of skim milk and the price of whole milk at 26 separate grocery stores. Independent or dependent samples? • I compare mortgage lending rates at 32 Utah credit unions and at 19 Utah banks. Independent or dependent? • To assess the effectiveness of a speed reading program, I evaluate the 28 class members’ reading speed before they start the course and after they complete the course. Paired samples?
Examples • I put Goodyear tires on 40 cars and Firestone tires on 40 cars and monitor tread life. Independent or dependent samples? • I put Goodyear tires on the left side of 36 cars and Firestone tires on the right side of these cars. Independent of dependent (paired) samples? • I survey the price of skim milk and the price of whole milk at 26 separate grocery stores. independent or dependent samples? • I compare mortgage lending rates at 32 Utah credit unions and at 19 Utah banks. Independent or dependent? • To assess the effectiveness of a speed reading program, I evaluate the 28 class members’ reading speed before they start the course and after they complete the course. Paired samples?
Comparing two population means Independent Samples A-B = (xA – xB) z . S.E. Dependent Samples A-B = D z . S.E. For small samples use the t table instead of Z. MINITAB uses only t for ANY comparison of two population means.
Difference in Methodology Independent Samples Sample ASample B 118 104 120 114 116 121 119 115 122 118 120 119.2 114.4 (XA) (XB) Dependent Samples Sample ASample BDifference 122 120 2 120 114 6 117 119 -2 121 118 3 119 120 -1 D = 1.6 R.V. here is XA -XB R.V. here is D
Comparing two population means • Special consideration when dealing with small independent samples is whether or not the two populations under study have the same variance. CONSIDER: • IQ scores for Harvard students versus IQ scores for students at WSU • IQ scores for males versus IQ scores for females.
Summary Considerations Comparing Two Population Means • Independent Samples? • Large? Use Z • Small? Uset Must also address the question as to whether or not the two populations have equal variances. • Dependent Samples? • Large? Use z • Small? Use t (MINITAB always uses t for two populations.)
Example problems • An economist wants to know if hourly labor rates at automotive garages differ between Utah and Idaho. He surveys 32 garages in Utah and 30 garages in Idaho. He seeks both a point estimate and a 95% interval estimate of Utah - Idaho
Example problems • An economist wants to know if hourly labor rates at automotive garages differ between Utah and Idaho. He surveys 32 garages in Utah and 30 garages in Idaho. He seeks both a point estimate and a 95% interval estimate of Utah - Idaho Answer: Point estimate: $4.48 Interval estimate at 95%: $1.28 to $7.70 What is the margin of error? It’s half the interval: $4.48 ± $3.21
Example problems • Ten infants were involved in a study to compare the effectiveness of two medications for the treatment of diaper rash. For each afflicted baby, Desitin was applied to the left cheek and A&D anointment was applied to the right cheek. The number of hours for the rash to disappear was recorded for each medication and each infant. For a 99% level of confidence, is there sufficient evidence to conclude that a difference exists in the effectiveness of these medications? Assume normal distributions.
Example problems • Ten infants were involved in a study to compare the effectiveness of two medications for the treatment of diaper rash. For each afflicted baby, Desitin was applied to the left cheek and A&D anointment was applied to the right cheek. The number of hours for the rash to disappear was recorded for each medication and each infant. For a 99% level of confidence, is there sufficient evidence to conclude that a difference exists in the effectiveness of these medications? Answer: Point estimate: 2.6 hours Interval estimate at 99%: .266 to 4.934 hours What is the margin of error? It’s half the interval: 2.6± 2.334 hours
A final note . . . Whenever we need to do statistical inference on two population means, we get the strongest tests if we design our experiment with large, paired samples. Next best option is large, independent samples. If we don’t have the luxury of large samples, small paired samples are best. If we can’t obtain paired data, the next strongest tests are small samples with the assumption of equal variances. BUT . . . The weakest tests come from small, independent samples coming from populations with unequal variances.
Interval Estimates of a Population Proportion • Can’t do proportion problems with a small sample • Confidence interval for population proportion, large sample [n 5 and n(1- ) 5] . . .
Example: • Altius Insurance Company now offers a discount on group health policies to companies with at least 90% nonsmoking employees. Huntsman Chemicals (with 5213 employees) wants the discount. Altius surveys 200 randomly chosen Huntsman employees and finds that 174 are non-smokers. Huntsman is denied the discount. Construct a 95% CI and see if you can defend Huntsman’s request for a discount. • ANSWER: 95% C.I.: .87 1.96(.02378) = .823 to .917
Computing Sample Size Given a desired margin of error and the “Z” associated with the desired level of confidence, solve for n: • What do you use for P? • Use the results of a preliminary sample • Use the that is postulated in the problem, or • Use .5 (a conservative approach that assures that your margin of error will not be exceeded).
Example • A congressional committee wanted to estimate the proportion of companies that pay the entire cost of employee health care coverage. Determine the number of companies that would have to be sampled to be 96% confident that the sample proportion would be off from the true value by no more than .03. • Solution:
Example • A congressional committee wanted to estimate the proportion of companies that pay the entire cost of employee health care coverage. Determine the number of companies that would have to be sampled to be 96% confident that the sample proportion would be off from the true value by no more than .03. • Solution: n = 1172
MINITAB Examples: • The National Transportation Safety Board conducted a study of truck drivers killed in highway accidents. They found that 24 of 185 drivers tested positive for alcohol. Obtain a 90 percent confidence interval for the true percentage of truck driver deaths in which the truck driver had a positive level of alcohol. 9% to 17% • During the testing of two new drugs for hypertension, a group of 644 patients received Drug Y and 28 reported headaches as a side effect. Another group of 207 patients received Drug Z and 14 reported headaches as a side effect. At a 95% level of confidence, is there a statistically significant difference in the proportions of patients who reported headaches as a side effect from taking these medications? No, because the C.I. Includes zero.
Another Example: • One of America’s most controversial politicians is Hillary Rodham Clinton. She is either admired or greatly despised. The Republican National Committee believes her support is stronger among women. The RNC randomly sampled 982 men and 373 were found to think favorably of Ms. Clinton. The RNC also sampled 950 women and 401 were found to admire Ms. Clinton. • What is a point estimate of the difference in the proportion of women who admire her as compared to men? • What is a 95% interval estimate of the difference? • Do you conclude that there exists a statistically significant difference in admiration levels among men and women? Why?
The point estimate of the difference between men and women who think highly of Mrs. Clinton is 4.2% • This is not statistically significant if we design our experiment as being wrong 5 out of 100 times. We would say that the 4.2% difference is not big enough to argue that this sample result did not occur by chance. • This is statistically significant if we design our experiment as being wrong 6 out of 100 times. We’d conclude that based upon sample information we believe that in the population as a whole men and women do view Mrs. Clinton differently.
One last example: • In a random survey of Utah drivers, 61 out of 90 male drivers considered themselves aggressive at the wheel. A similar survey of women drivers resulted in 31 out of 106 describing themselves as aggressive drivers. • What is the point estimate of the difference in proportions of men and women drivers in Utah who think they drive aggressively? 67.8% - 29.2% = 38.5% • Estimate with 95% confidence the difference in the proportions of men and women drivers in Utah who think they drive aggressively 25.6% to 51.5%