360 likes | 505 Views
Week 8. Confidence Intervals for Means and Proportions. Inference. Data are a single sample Interested in underlying population , not specific sample Sample gives information about population Randomness of sample means uncertainty Called inference about population. Types of inference.
E N D
Week 8 Confidence Intervals for Means and Proportions
Inference • Data are a single sample • Interested in underlying population, not specific sample • Sample gives information about population • Randomness of sample means uncertainty • Called inference about population
Types of inference • Focus on value of population parameter • e.g. mean or proportion (probability) • Estimation • What is the value of the parameter? • Hypothesis testing • Is the parameter equal to a specific value (usually zero)?
Point estimate • To estimate population parameter, use corresponding sample statistic • e.g. • Likely to be an error in estimate • e.g. • How big is error likely to be?
Error distribution • Error is random • Simulation from an ‘approx’ population could build up error distribution • Shows how large error from actual sample data is likely to be
Example • Silkworm survival after arsenic poisoning • How long will 1/4 survive? • What is upper quartile?
Simulation • Approx population (same mean & sd as data) • Target = UQ from normal = 293.3 sec
Simulation (cont) • Sample UQs ≠ target • Simulation shows error distribution • Error in estimate (292 sec) unlikely to be more than 10 sec.
Error distn for proportion • Simulation is not needed
Standard error of proportion • Approx error distn • bias = 0 • standard error =
Teens and interracial dating 1997 USA Today/Gallup Poll of teenagers across country: 57% of the 497 teens who go out on dates say they’ve been out with someone of another race or ethnic group. • Point estimate: • Bias = 0 • Standard error = = 0.57
Error distn (interracial dating) • General normal • Error distn • Error in estimate, p = 0.57, • unlikely to be more than 0.05 • almost certainly less than 0.07 • = 0, s = 0.022 -.066 -.044 -.022 0 .022 .044 .066
Interval estimates Survey 150 randomly selected students and 41% think marijuana should be legalized. If we report between 33% and 49% of all students at the college think that marijuana should be legalized, how confident can we be that we are correct? Confidence interval: an interval of estimates that is ‘likely’ to capture the population value.
95% confidence interval • Legalise? p = 0.41, n = 150 • 70-95-100 rule of thumb • Prob(error < 2 x 0.0412) is approx 95% • We are 95% confident that is between 0.41 – 0.0824 and 0.41 + 0.0824 0.33 and 0.49 95% Conf Interval
Interpreting 95% C.I. • Confidence interval is function of sample data • Random • It may not include population parameter ( here) • In repeated samples, about 95% of CIs calculated as described will include • We therefore say we are 95% confident that our single CI will include
Teens and interracial dating 1997 USA Today/Gallup Poll of teenagers across country: 57% of the 497 teens who go out on dates say they’ve been out with someone of another race or ethnic group. • Point estimate: • Standard error = • 95% C.I. is 0.57 - 0.044 to 0.57 + 0.044 0.526 to 0.614 We would prefer more decimals! = 0.57
Teens and interracial dating • 95% C.I. is 0.526 to 0.614 • We do not know whether is between 0.526 and 0.614 • However 95% of CIs calculated in this way will work • We are therefore 95% confident that is in (0.526, 0.614)
St error & width of 95% C.I. • Smallest s.e. and C.I. width when: • n is large • p is close to 0 or 1 • Biggest s.e. and C.I. width when: • n is small • p is close to 0.5
Margin of error • Public opinion polls usually estimate several popn proportions. • Each has its own “± 2 s.e.” describing accuracy • n = 350
Margin of error (cont) • n = 350 • Maximum possible is “Margin of error” for poll
Requirements for C.I. • Sample should be randomly selected from population • “Large” sample size — at least 10 success and 10 failure (though some say only 5 needed) • If finite population, at least 10 times sample size
Case Study :Nicotine Patches vs Zyban Study: New England Journal of Medicine 3/4/99) • 893 participants randomly allocated to four treatment groups: placebo, nicotine patch only, Zyban only, and Zyban plus nicotine patch. • Participants blinded: • all used a patch (nicotine or placebo) • all took a pill (Zyban or placebo). • Treatments used for nine weeks.
Nicotine Patches vs Zyban (cont) Conclusions: Zyban is effective(no overlap of Zyban and not Zyban CIs) Nicotine patch is not particularly effective(overlap of patch and no patch CIs)
Error distribution for mean • Again, a simulation is unnecessary to find the error distribution (approx)
Standard error of mean • Approx error distn • bias = 0 • standard error =
Mean hours watching TV Poll: Class of 175 students. In a typical day, about how much time to you spend watching television? • Point estimate: • Bias = 0 • Standard error, n Mean Median StDev 175 2.09 2.000 1.644 = 2.09 hours
Standard devn & standard error • Sample standard deviation • is approx • stay similar if n increases • Standard error of mean • is usually less than • decreases as n increases Don’t get mixed up between the two!
Error distn (hours watching TV) • General normal • Error distn • Error in estimate, = 2.09 hours, • unlikely to be more than 0.25 hrs • almost certainly less than 0.4 hrs • = 0, s = 0.124 -.372 -.248 -.124 0 .124 .248 .372
General form for 95% C.I. • Error distn • If • error distn is normal • zero bias & we can find s.e. se se se Prob( error is in ± 2 s.e.) is approx 0.95 • 95% confidence interval: estimate ± 2 s.e. • 95% confidence interval: estimate ± 1.96 s.e. (if really sure error distn is normal)
95% confidence interval • Mean hrs watching TV? • 70-95-100 rule of thumb • Prob(error < 2 x 0.124) is approx 95% • We are 95% confident that is between 2.09 – 0.248 and 2.09 + 0.248 1.84 and 2.34 hours = 2.09 hrs, n = 175 95% C. I.
Requirements for C.I. • Sample should be randomly selected from population • “Large” sample size — n > 30 is often recommended • If finite population, at least 10 times sample size
Problem with small n • Known • Unknown • Variable width • Less likely to include • Confidence level less than 95% works fine
C.I. for mean, small n • Solution is to replace 1.96 (or 2) by a bigger number. • Look up t-tables with (n - 1) ‘degrees of freedom’
Example: Mean Forearm Length Data: From random sample of n = 9 men 25.5, 24.0, 26.5, 25.5, 28.0, 27.0, 23.0, 25.0, 25.0 df = 8 t8 = 2.31 95% C.I.: 25.5 2.31(.507) => 25.5 1.17 => 24.33 to 26.67 cm
What Students Sleep More? Q: How many hours of sleep did you get last night, to the nearest half hour? Class n Mean StDev SE MeanStat 10 (stat literacy) 25 7.66 1.34 0.27Stat 13 (stat methods) 148 6.81 1.73 0.14 • Notes: • CI for Stat 10 is wider (smaller sample size) • Two intervals do not overlap
Interpreting 95% C.I. • Confidence interval is function of sample data • Random • It may not include population parameter ( here) • In repeated samples, about 95% of CIs calculated as described will include • We therefore say we are 95% confident that our single CI will include