350 likes | 504 Views
SAMPLING DISTRIBUTIONS. Chapter 7. 7.1 How Likely Are the Possible Values of a Statistic? The Sampling Distribution. Statistic and Parameter. Statistic – numerical summary of sample data: p-hat or xbar Parameter – numerical summary of a population: µ for example.
E N D
SAMPLING DISTRIBUTIONS Chapter 7
7.1 How Likely Are the Possible Values of a Statistic? The Sampling Distribution
Statistic and Parameter • Statistic – numerical summary of sample data: p-hat or xbar • Parameter – numerical summary of a population: µ for example. • In practice, we seldom know parameters, which are estimated using sample data: statistics estimate parameters
Sampling Distributions: Gray Davis • Before counting votes, the proportion in favor of recalling Governor Gray Davis was an unknown parameter • Exit poll of 3160 voters had sample proportion in favor of a recall as 0.54 • Different random sample of 3000 voters would have different sample proportion • Sampling distribution of sample proportion shows all possible values and probabilities for those values
Sampling Distributions • Sampling distribution of statistic is probability distribution that specifies probabilities for possible values the statistic can take • Describe variability that occurs from study to study using statistics to estimate parameters • Help predict how close statistic falls to parameter it estimates
Mean and SD of Sampling Distribution for Proportion For random sample of size n from population with proportion p in a category, the sampling distribution of the proportion of the sample in that category has:
The Standard Error To distinguish standard deviation of a sampling distribution from standard deviation of ordinary probability distribution, we refer to it as a standard error
2006 California Election • If population proportion supporting reelection of Schwarzenegger was 0.50, would it have been unlikely to observe the exit-poll sample proportion of 0.565? • Would you be willing to predict that Schwarzenegger would win the election?
2006 California Election Given exit poll had 2705 people and assuming 50% support, estimate of population proportion and standard error:
2006 California Election • Sample proportion of 0.565 is more than six standard errors from expected value of 0.50 • Sample proportion of 0.565 voting for reelection of Schwarzenegger would be very unlikely if population proportion were p = 0.50 or p < 0.50
Population Distribution Population distribution: probability distribution from which we take sample • Values of its parameters are usually unknown – what we’d like to learn about
Data Distribution • Distribution of the sample data that we actually see in practice • Described by statistics • With random sampling, the larger n is, the more closely the data distribution resembles the population distribution
Sampling Distribution • Probability distribution of a sample statistic • With random sampling, provides probabilities for all the possible values of statistic • Key for telling us how close sample statistic falls to corresponding unknown parameter • Standard deviation is called standard error
Clinton vs. Spencer: Senatorial Seat 2006 U.S. Senate election in NY • An exit poll of 1336 voters showed • 67% (895) voted for Clinton • 33% (441) voted for Spencer • When 4.1 million votes tallied • 68% voted for Clinton • 32% voted for Spencer Let X= vote outcome with x=1 for Clinton and x=0 for Spencer
Clinton vs. Spencer: Senatorial Seat • Population distribution is 4.1 million x-values, 32% are 0, and 68% are 1. • Data distribution is 1336 x-values from exit poll, 33% are 0, and 67% are 1. • Sampling distribution of sample proportion is approximately normal with p=0.68 and • Only sampling distribution is bell-shaped; others are discrete and concentrated at two values 0 and 1
Sampling Distribution of Sample Mean The sample mean, x, is a random variable that varies from sample to sample, whereas the population mean, µ, is fixed.
Sampling Distribution of Sample Mean • Sampling distribution of sample mean for random samples of size n from a population with mean µ and standard deviation σ, has: • Center and mean is same mean, µ • Spread is standard error of
Pizza Sales Daily sales at a restaurant vary around a mean, µ = $900, with a standard deviation of σ = $300. • What are the center and spread of the sampling distribution?
Effect of n on the Standard Error • The standard error of the sample mean = • As n increases, denominator increases, so s.e. decreases • With larger samples, the sample mean is more likely to be close to the population mean
Central Limit Theorem How does the sampling distribution of the sample mean relate with respect to shape, center, and spread to the probability distribution from which the samples were taken?
Central Limit Theorem (CLT) For random sampling with a large sample size n, sampling distribution of sample mean is approximately normal, no matter what the shape of the original probability distribution
Sampling Distribution of Sample Means • More bell-shaped as n increases • The more skewed, the larger n must be to get close to normal • Usually close to normal when n is 30 • Always approximately normal for approximately normal populations
CLT: Making Inferences • For large n, sampling distribution is approximately normal even if population distribution is not • Enables inferences about population means regardless of shape of population distribution
Calculating Probabilities of Sample Means • Distribution of milk bottle weights is normally distributed with a mean of 1.1 lbs and σ = 0.20 • What is the probability that the mean of a random sample of 5 bottles will be greater than 0.99 lbs?
Calculating Probabilities of Sample Means • Closing prices of stocks have a right skewed distribution with a mean (µ) of $25 and σ= $20. • What is the probability that the mean of a random sample of 40 stocks will be less than $20?
Calculating Probabilities of Sample Means An automobile insurer found repair claims have a mean of $920 and a standard deviation of $870. Suppose the next 100 claims can be regarded as a random sample. • What is the probability that the average of the 100 claims is larger than $900?
Calculating Probabilities of Sample Means Distribution of actual weights of 8 oz. wedges of cheddar cheese is normal with mean =8.1 oz and standard deviation of 0.1 oz • Find x such that there is only a 10% chance that the average weight of a sample of five wedges will be above x
Calculating Probabilities of Sample Means Distribution of 8 oz. wedges have mean = 8.1 oz. and standard deviation = 0.1 oz. • Find x such that there is only a 5% chance the average weight of a sample of five wedges will be below x
Using the CLT to Make Inferences Implications of the CLT: • For large n, sampling distribution of is approximately normal despite population shape • When approximately normal, is within 2 standard errors of µ 95% of the time and almost certainly within 3
Standard Errors in Practice Standard error have exact values that depend on parameters: In practice, parameters are unknown so we approximate with p-hat and s
Sampling Distribution for a Proportion • Binomial probability distribution is a sampling distribution with x as # of successes in n independent trials and y as probability • Sample proportion (not #) of successes is usually reported, but proportions use the same formulas for the mean and standard deviation of the sampling distribution