520 likes | 640 Views
STAT131 W8L1 Sampling Distributions. by Anne Porter alp@uow.edu.au. Introduction. How big does be in order for us to conclude that there is a lack of fit?.
E N D
STAT131W8L1 Sampling Distributions by Anne Porter alp@uow.edu.au
Introduction • How big does be in order for us to conclude that there is a lack of fit? • How big does the difference in mean hand lengths between males and females have to be before we consider there to be a difference not due to chance? • It is as if by magic, rule,…. To look at these questions we examine the concept of sampling distributions
Sampling distributions • When we repeatedly sample from a population it is evident that there is variation in the various statistics that are computed. • This is the nature of sampling (chance) • Repeated samples give rise to a distribution of the statistics calculated for each of the samples ie a Sampling distribution
Sampling distributions These may be determined • Theoretically • Empirically (through simulation)
Sampling Distributions: Formal • The distribution of all possible values that can be assumed by some statistic (eg mean, variance), computed from samples of the same size randomly drawn from the same population, is called a sampling distribution of that statistic.
Distribution of Means • When the mean is calculated for samples of the same size from a population there are some important results.
Central Limit Theorem(Large Sample Normality) • Given a random sample X1, X2, ..Xn from any distribution with mean m and finite variance s2, then irrespective of the distribution of the parent population, the distribution of approaches the shape of a normal distribution when the sample size is large, with a mean and standard deviation irrespective of sample size.
frequency 1 0 2 4 6 8 10 12 Population Values Activity 1: • Given the population of values (N=7) with values 0,2,4,6,8,10,12, plot the distribution. How would you describe the shape?
frequency 1 0 2 4 6 8 10 12 Population Values Activity 1: • Given the population of values (N=7) with values 0,2,4,6,8,10,12, plot the distribution. How would you describe the shape? Uniform except that this is discrete not continuous
Relative frequency 0.14 0 2 4 6 8 10 12 Population Values Activity 1: • How else may we have plotted the population - rather than using frequency? Use relative frequency ie divide frequency by total number
Mean of draw1 and draw2 Draw 2 Draw 1 • Find the sampling distribution of all means, for all samples that can be drawn with replacement and of size 2 (n=2) from the population: 0,2,4,6,8,10,12 0
Mean of draw1 and draw2 Draw 2 Draw 1 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 • Find the sampling distribution of all means, for all samples that can be drawn with replacement and of size 2 (n=2) from the population: 0,2,4,6,8,10,12 0 1 2 3 4 5 6 7 2 3 4 5 6 7 8 1 2 3 4 5 6
Draw 2 Draw 1 0 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 1 2 3 4 5 6 • Find the frequency of each mean Means 0 1 2 3 4 5 6 7 8 9 10 11 12 Freq 1
Draw 2 Draw 1 0 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 1 2 3 4 5 6 • Find the frequency of each mean Means 0 1 2 3 4 5 6 7 8 9 10 11 12 Freq 2 3 4 5 6 7 6 5 4 3 2 1 1
freq 7 6 5 4 3 2 1 0 Means 0 1 2 3 4 5 6 7 8 9 10 11 12 Freq 2 3 4 5 6 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 1 Means of samples Size 2 • Plot the frequency of each mean
freq 7 6 5 4 3 2 1 0 Means 0 1 2 3 4 5 6 7 8 9 10 11 12 Freq 2 3 4 5 6 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 1 Means of samples Size 2 • Plot the frequency of each mean
Means 0 1 2 3 4 5 6 7 8 9 10 11 12 Freq 1 2 3 4 5 6 7 6 5 4 3 2 1 Relative Freq • Find the sampling distribution of the means using relative frequency 1/49 2/49
means 0 1 2 3 4 5 6 7 8 9 10 11 12 Frequency 1 2 3 4 5 6 7 6 5 4 3 2 1 Relative Frequency 1/49 2/49 3/49 4/49 5/49 6/49 7/49 6/49 5/49 4/49 3/49 2/49 1/49 • Find the sampling distribution of the means using relative frequency
means 0 1 2 3 4 5 6 7 8 9 10 11 12 Relative Frequency 1/49 = 0.02 2/49 = 0.04 3/49 4/49 5/49 6/49 7/49 6/49 5/49 4/49 3/49 2/49 1/49 • Hence we have the sampling distribution of the means
rel freq 0 1 2 3 4 5 6 7 8 9 10 11 12 Means of samples Size 2 Plot the sampling distribution of means .14 .12 .10 .08 .06 .04 .02 0
rel freq 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Means of samples Size 2 How can you describe the shape of the distribution?
rel freq 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Means of samples Size 2 Symmetric, discrete (not bell shaped but is moving that direction from the original population. How can you describe the shape of the distribution?
Central Limit Theorem(Large Sample Normality) • Given a random sample X1, X2, ..Xn from any distribution with mean m and finite variance s2, then irrespective of the distribution of the parent population, the distribution of approaches the shape of a normal distribution when the sample size is large, • with a mean • and standard deviation irrespective of sample size
E(X) • Given the population of values (N=7) with values 0,2,4,6,8,10,12, find the mean of the population. What were our x values ? 0, 2, 4, 6, 8, 10, 12 What were the P(X=x)? 1/7 for all x E(X) =
E(X)- mean of population • Given the population of values (N=7) with values 0,2,4,6,8,10,12, find the mean of the population. What were our x values ? 0, 2, 4, 6, 8, 10, 12 What were the P(X=x)? 1/7 for all x E(X) = 0x1/7 + 2x1/7 + 4x1/7 +6x1/7 +8x1/7 +10x1/7 +12x1/7 E(X)=6
E(X)- mean of sampling distribution(Mean of means) • Given the population of values (N=7) with values 0,2,4,6,8,10,12, find the mean of the means of all size two samples.
What were our values ? and what were the ? E(X)- mean of sampling distribution(Mean of means) 0 1 2 3 4 5 6 7 8 9 10 11 12 1/49 2/49 3/49 4/49 5/49 6/49 7/49 6/49 5/49 4/49 3/49 2/49 1/49
E(X)- mean of sampling distribution(Mean of means) 0 1 2 3 4 5 6 7 8 9 10 11 12 1/49 2/49 3/49 4/49 5/49 6/49 7/49 6/49 5/49 4/49 3/49 2/49 1/49
Finding:Mean of population and mean of the means are the same Confirmation from our example • E(X)=6
Central Limit Theorem(Large Sample Normality) • Givena random sample X1, X2, ..Xn from any distribution with mean m and finite variance s2, then irrespective of the distribution of the parent population, the distribution of approaches the shape of a normal distribution when the sample size is large, • with a mean • and standard deviation irrespective of sample size
Variance population (X) • Given the population of values (N=7) with values 0,2,4,6,8,10,12, find the variance of the population. E(X)=6 Therefore (E(X))2=36 and to find E(X2) we need 0 2 4 6 8 10 12
Variance population (X) (E(X))2=36 and to find E(X2) we need 0 2 4 6 8 10 12 0 4 16 36 64 100 144 1/7 1/7 1/7 1/7 1/7 1/7 1/7
Variance population (X) (E(X))2=36 and to find E(X2) we need 0 2 4 6 8 10 12 0 4 16 36 64 100 144 1/7 1/7 1/7 1/7 1/7 1/7 1/7
Variance sample means • Given the sample means of size 2 of values with values 0,1,2,3,4,5,6,7,8,9,10,11,12, find the variance of the sample means. Therefore To find 0 1 2 3 4 5 6 7 8 9 10 11 12
Variance sample means Knowing Find 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 4 9 16 25 36 49 64 81 100 121 144 1/49 2/49 3/49 4/49 5/49 6/49 7/49 6/49 5/49 4/49 3/49 2/49 1/49
Variance sample means Knowing
Variance sample means Variance population of scores X
Central Limit Theorem(Large Sample Normality) • Given a random sample X1, X2, ..Xn from any distribution with mean m and finite variance s2, then irrespective of the distribution of the parent population, the distribution of approaches the shape of a normal distribution with a mean • and standard deviation when the sample size is large.
Variance sample means Variance population of scores X
Standard deviation of sample means Standard deviation of population scores From central limit theorem
Exploring the nature of samplingTable 1 : New Zealand high school rolls (July1994)task follows
Exercise • Take a random sample of ten observations from this table. • Plot them on graph1. • Calculate the mean of your ten observations and mark it on graph 1 • Take three random samples of ten observations. • Calculate the mean of each sample. • Calculate the standard deviation of the population (ie divisor N) • Plot these as a dot boxplots or error bars (mean + 2 std) and identify any gaps, clusters, outliers or unusual features.
Notice in when the data in each of the samples are displayed as error bars that there is variation in each sample. However there is little variation in the means of the samples than within the samples themselves.
Central Limit Theorem(Large Sample Normality) • Given a random sample X1, X2, ..Xn from any distribution with mean m and finite variance s2, then irrespective of the distribution of the parent population, the distribution of approaches the shape of a normal distribution when the sample size is large, with a mean • and standard deviation irrespective of size of sample
When repeatedly sampling from normal populations one will note: • there is variability when we compare samples • the variability between samples is more evident for small samples than for large samples • the main characteristics of the sample summaries eg means remains similar. • outliers, clusters, skewness appear more apparent in small samples and hence larger samples are less likely to mislead • the larger the sample size the more stable the sample summaries tend to be.
Activity 4 - homework The mean and known standard deviation of serum iron values for healthy men are 120 and 15 micrograms per 100ml. What is the mean and standard deviation of the means for samples of size 49?
Activity 4 - homework If the mean and known standard deviation of serum iron values for healthy men are 120 and 15 micrograms per 100ml, respectively what is the probability that a random sample of 49 normal men will yield a mean between 115 and 125 micrograms per 100 ml?
Activity 4 - homework If the mean and known standard deviation of serum iron values for healthy men are 120 and 15 micrograms per 100ml, respectively what is the probability that a random sample of 49 normal men will yield a mean between 115 and 125 micrograms per 100 ml?
Z=-2.33 Z=2.33