310 likes | 524 Views
Chapter 9: Estimation Using a Single Sample. Confidence Intervals. Inferential Statistics. Our study of confidence intervals begins our study of inferential statistics In inferential statistics, our objective is to learn about a population from a sample of data
E N D
Chapter 9: Estimation Using a Single Sample Confidence Intervals
Inferential Statistics • Our study of confidence intervals begins our study of inferential statistics • In inferential statistics, our objective is to learn about a population from a sample of data • Use a sample of data to decrease our uncertainty about the population the sample was drawn from • More specifically, we’ll be using samples of data to estimate unknown population parameters like and .
Point Estimates • A single number derived from a sample of data (statistic) that represents a plausible value for a population parameter • First, we decide what is the appropriate statistic. We then collect a random sample of data. The computed statistic is our point estimate -- as a point estimate for
More than one choice • Interested in the proportion of American voters who support gay marriages • Obviously the appropriate statistic is sample proportion -- as an estimate for • Sometimes there’s more than one choice • Sample mean, trimmed mean or median as a point estimate for population mean • How do you choose? Choose the statistic that tends, on average, to be the closest estimate to the true value.
Biased and Unbiased Statistic • When there’s more than one choice we want to choose the statistic that is most accurate • Sampling distributions of statistics give us information about how accurate a statistic is for estimating a population parameter • Statistics with sampling distributions that are centered on the parameter we’re trying to estimate are called unbiased • The two unbiased statistics we’ll be studying are sample mean and sample proportion
Accuracy of Point Estimates • Even though we might select an unbiased statistic, how accurate is this single number that we calculate? • Remember sampling variability? • Example – samples of 50 from a normal distribution • Using an unbiased statistic with a small standard deviation guarantees no systematic tendency to underestimate or overestimate the parameter and the estimates will be relatively close to the true value
Confidence Intervals • How accurate a point estimate is depends on which sample you happen to draw from the population • While the point estimate using an unbiased statistic may be our best single-number best guess – it’s not the only plausible estimate • An alternative to a single number estimate is to provide a range of values or an interval that we feel very confident the true value will fall into • We call this type of estimation confidence intervals
Definition of Confidence Interval • An interval of plausible values for the characteristic. It is constructed so that, with a chosen degree of confidence, the value of the characteristic – parameter – will be captured in the interval
Confidence Interval • Confidence Interval = Statistic Critical Value x Statistic Std Dev • Statistic • Standard Deviation of Sampling Distribution • Critical Value • Associated confidence level • How much confidence we have in the method used to construct the CI • Not our confidence in any particular interval
Basic Concept of CI • We start with the sampling distribution of the statistic we are using • We will be using sampling distributions that are well approximated by a normal distribution • We take a sample and calculate a point estimate, a statistic (unbiased) from that sample
Continuing … • With what we know about normal distributions, we know that about 95% of the statistics calculated from random samples will fall within 2 sd of the mean. • The mean of the sampling distribution is centered on the population parameter • If the statistic is within approx 2 sd of the sampling distribution’s mean 95% of the time, then the interval Statistic Critical Value x Statistic Std Devwill capture the mean of the sampling distribution 95% of the time
More … • The width of the interval is adjusted by selecting a different confidence level • Typical confidence levels are 90%, 95% and 99% • The endpoints are determined by multiplying the critical values (which are determined by confidence levels) by the sampling distribution standard deviation (sd of the statistic)
Large Sample Confidence Interval for a Population Proportion • Parameter of interest is the population proportion • Statistic used is sample proportion • Why are large sample CI ?? From last chapter, when sample is large, the statistic is normally distributed • How large is large? • We know and
Large Sample Confidence Interval for a Population Proportion • Calculate a sample proportion from a random sample • Estimate the sample standard deviation • standard error • Choose a confidence level – let’s say 95% • Determine the critical value • Use standard normal table – 1.96 • Calculate your confidence interval • Confidence Interval = Statistic Critical Value x Statistic Std Dev
Let’s do an example • Pg 453 Problem # 9.14
In summary • The Large Sample Confidence Interval for • p is the sample proportion from a random sample • The sample size, n, is large • The CI is • The desired confidence level determines which critical value is used • Note: This method is not appropriate for small samples
Choosing the Sample Size • Terminology: Bound • Consider the statistic an estimate of the parameter • Consider ‘critical value x standard deviation’ the bound on the error of your estimate • In the case of population proportions Confidence Interval = Statistic Critical Value x Statistic Std Dev
Finding appropriate sample size • Consider that before you do a study, you may be asked to estimate a particular parameter to a certain degree of accuracy • The question now is, how big a sample should I take to get a specific degree of accuracy at a certain confidence level • We use the ‘bound’ to determine sample size • But the population parameter is unknown so we make a reasonable estimate – or use .5 as a conservative estimate for • Example – pg 454, 9.25
Confidence Interval for Population Mean • We’ll look at these cases: • Population standard deviation is known • Small sample but population is approx normal • Population standard deviation is unknown • Small sample but population is approx normal
Sampling Distribution of the Sample Mean • When the population is normal, the sampling distribution is normal regardless of sample size • When the population is not normal, the sampling distribution is normal if the sample size is large (CLT).
Confidence Interval for Population MeanKnown • is sample mean from a random sample • Sample size is large or population is approximately normal • Population standard deviation is known • CI is:
Sampling Distribution of the Sample MeanUnknown • When the population is normal, the sampling distribution is normal regardless of sample size • When the population is not normal, the sampling distribution is normal if the sample size is large (CLT)
Confidence Interval for Population Mean Unknown • is sample mean from a random sample • Sample size is large or population is approximately normal • Population standard deviation is known • CI is:
Student’s t-Distribution • Recall that a standard normal distribution is a bell-shaped distribution with parameters and • The t-distribution is bell-shaped and centered on 0. • There are many t-distributions differentiated by the degrees of freedom – which is n-1 • Each t-curve is a little more spread out than the z-curve but as n gets larger and larger, the t-distribution approaches the z-curve.
Student’s t-Distribution • Recall from our study of sampling distribution the properties of the sampling distribution of • When the population standard deviation is not known, then is distributed according to the t-distribution • This distribution will give us critical values a little higher than a normal distribution since we don’t know the value of the population distribution --therefore introducing a little more uncertainty
t-Distribution Table • Appendix III in the back of your textbook
Choosing the Sample Size • When estimating the population mean using a large sample or a small sample from a normal population, the bound on error estimation, associated with a 95% CL is • Since population standard deviation is usually unknown we can • Make a best guess • Divide the Range by 4
Degrees of Freedom • The number of independent pieces of information that go into the estimate of the parameter • The number of values in the calculation of a statistic that are free to vary • The number of pieces of independent pieces of info that go into an estimate minus the number of parameters estimated