190 likes | 258 Views
Chapter 5. Sampling Distributions of Statistics. Basic Terms. Any quantity computed from values in a sample is called a statistic .
E N D
Chapter 5 Sampling Distributions of Statistics
Basic Terms • Any quantity computed from values in a sample is called a statistic. • The observed value of a statistic depends on the particular sample selected from the population; typically, it varies from sample to sample. This variability is called sampling variability.The distribution of a statistic is called its sampling distribution.
Example 1 • Consider a population that consists of the numbers 1, 2, 3, 4 and 5 generated in a manner that the probability of each of those values is 0.2 no matter what the previous selections were. This population could be described as the outcome associated with a spinner such as given below with the distribution next to it. What’s the mean?
This is the sampling distribution of the sample mean for samples of size 2 from a U(1,2,3,4,5) Example 1 --continued • If the sampling distribution for the means of samples of size two is analyzed, it looks like
Other Exact Sampling Distributions of • Bernoulli r.v.s Binomial • Normal Normal • Exponential Gamma • In general the exact sampling distribution is difficult to derive, especially if the population distribution is unknown. • But when sample size n is large, one may approximate the distribution of by a normal distribution using central limit theorem (CLT).
1 1 2 2 3 3 4 4 5 5 Example 1 --continued • The original distribution and the sampling distribution of means of samples with n=2 are given below. Sampling distribution n = 2 Original distribution
1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 1 2 3 4 5 Example 1 --continued • Sampling distributions for n=3 and n=4 were calculated and are illustrated below. Original distribution Sampling distribution n = 2 Sampling distribution n = 3 Sampling distribution n = 4
2 3 4 Means (n=30) 2 3 4 Means (n=60) 2 4 3 Means (n=120) Simulations To illustrate the general behavior of samples of fixed size n, 10000 samples each of size 30, 60 and 120 were generated from this uniform distribution and the means calculated. Probability histograms were created for each of these (simulated) sampling distributions. Notice all three of these look to be essentially normally distributed. Further, note that the variability decreases as the sample size increases.
Skewed distribution Simulations To further illustrate the general behavior of samples of fixed size n, 10000 samples each of size 4, 16 and 30 were generated from the positively skewed distribution pictured below. Notice that these sampling distributions are all skewed, but as n increased the sampling distributions became more symmetric and eventually appeared to be almost normally distributed.
Terminology Let denote the mean of the observations in a random sample of size n from a population having mean m and standard deviation s. Denote the mean value of the distribution by and the standard deviation of the distribution by (called the standard error of the mean), then the rules on the next two slides hold.
Central Limit Theorem. Rule 4: When n is sufficiently large (>30), the sampling distribution of is approximately normally distributed, even when the population distribution is not itself normal.
Illustrations of Sampling Distributions Symmetric normal like population
Illustrations of Sampling Distributions Skewed population
If n is large or the population distribution is normal, the standardized variable has (approximately) a standard normal (Z) distribution. More about the Central Limit Theorem. The Central Limit Theorem can safely be applied when n exceeds 30.
Example 2 A food company sells “18 ounce” boxes of cereal. Let X denote the actual amount of cereal in a box of cereal. Suppose that X is normally distributed with m = 18.03 ounces and s = 0.05. a) What proportion of the boxes will contain less than 18 ounces?
The central limit theorem states that the distribution of is normally distributed so Z Z Example 2 - continued b) A case consists of 24 boxes of cereal. What is the probability that the mean amount of cereal (per box in a case) is less than 18 ounces?
n = 10 n = 20 n = 50 n = 100 0.2 0.2 0.2 0.2 Sampling Distribution of proportions where p= 0.2 Let X be the number of successes (S) in a random sample of size n from a population whose proportion of S is p. (The proportion of S is then X/n and X ~ Bin(n,p).)
Condition for Use The further the value of p is from 0.5, the larger n must be for a normal approximation to the sampling distribution of p to be accurate. Rule of Thumb If both np 10 and n(1-p) 10, then it is safe to use a normal approximation.