240 likes | 341 Views
Chapter 6. Sampling Distributions 6.1 The Sampling Distribution of the Sample Mean 6.2 The Sampling Distribution of the Sample Proportion. Introduction. This chapter will lay foundation for statistical inference techniques in later chapters.
E N D
Chapter 6 Sampling Distributions • 6.1 The Sampling Distribution of the Sample Mean • 6.2 The Sampling Distribution of the Sample Proportion
Introduction • This chapter will lay foundation for statistical inference techniques in later chapters. • Usually the population of interest is so large that it would be impractical to analyze the entire population. The purpose of statistical inference is to obtain information about the population from information gathered while examining a much smaller sample. • Analysis of the sample provide only estimates of the values of the population characteristics. • A parameter is a numerical characteristic of a population. • With proper sampling methods, the sample results provide “good” estimates of the population characteristics. • In Chapter 1 we discussed various random sampling techniques.
Point Estimators • A point estimator is a sample statistic (i.e. a value computed from sample data) that serves as an estimate of a population parameter. • The sample mean, is the point estimator of the population mean . • The sample standard deviation, s is the point estimator of the population standard deviation . • The sample proportion is the point estimator of the population proportion p. • In this chapter we will the probability distributions of and .
Example 6.1 St. Andrew’s College receives 900 applications annually from prospective students. The application forms contain a variety of information including the individual’s scholastic aptitude test (SAT) score and whether or not the individual desires on-campus housing. Anderson, Sweeney, and Williams
The director of admissions would like to know the following information: • The average SAT score for the applicants, and • The proportion of applicants that want to live on campus.
There are two separate approaches to solving this problem: 1. Analysis the application of all past applicants and compute and p. or 2. Analyze a sample of past applications and compute point estimates. Then use these point estimates to approximate and p. The second approach is statistical inference and is far less expensive and time consuming to employ. Moreover, reasonably good estimates can be obtained.
Sampling Distribution of The sampling distribution of is the probability distribution of all possible values of the sample mean from all possible samples of size n . E(x) or
Sampling Distribution of • Standard Deviation of Finite Population Infinite Population • A finite population is treated as being infinite if n/N< .05. (i.e. the sample size is less than 5% of the population size) • is the finite correction factor. • is the standard error of the mean.
Please note that most populations of interest are huge, and therefore the sample size is usually less than 5% of the population size. Therefore we will focus on the infinite population formulas.
Central Limit Theorem For a sufficiently large sample (n> 30), the population of all possible sample means( i.e. the sampling distribution of ) is approximately normally distributed with Note: When the sample is small (n < 30), the sampling distribution of can be considered normal only if we assume the population has a normal probability distribution.
Central Limit Theorem Simulation Bowerman, O’Connell, and Hand
Example 6.1 Revisited Suppose that further research revealed that the population average SAT score of applicants is 990 (i.e. = 990) and the population standard deviation is 80 (i.e. = 80). What is the probability that a sample of 30 applicants will provide an estimate of the population mean SAT score that is within plus or minus 10 of the actual population mean ? In other words, what is the probability that (the mean of the 30 application) will be between 980 and 1000?
Recall from slide 10, since n=30, is normally distributed with = or = 990 and = or = = 14.6
Example 6.1 Revisited Sampling Distribution of for the SAT Scores Area = ? Anderson, Sweeney, and Williams = 990 1000 980
Restate P(980 < <1000) as P( -.68 < z <.68) Since • Using the standard normal probability table we find that the area under the normal curve between the mean and .68 equals .2518. Thus the area under the curve between -.68 and .68 equals 2*.2518 = .5036. The probability is .5036 that the sample mean will be within +/-10 of the actual population mean.
1000 -.68 0 .68 Example 6.1 Revisited Sampling Distribution of for the SAT Scores P(980< <1000) =.5036 Area = 2(.2518)=.5036 Anderson, Sweeney, and Williams = 990 980 z
Sampling Distribution of The sampling distribution of is the probability distribution of all possible values of the sample proportion . • The sampling distribution of can be approximated by a normal probability distribution whenever the sample size is large. • The sample size is considered large whenever these conditions are satisfied: np>5 and n(1 – p) > 5
is normally distributed with and As previously mentioned, since the sample size is usually less than 5% of the population size we use the formula for an infinite population. Therefore we only listed the infinite population formula here.
Suppose further research also revealed that the population proportion of a applicants wanting on-campus housing is .72. What is the probability that a simple random sample of 30 applicants will provide an estimate of the population proportion of applicants desiring on- campus housing that is within plus or minus .05 of the actual population proportion? In other words, what is the probability that will be between .67 and .77?
The normal probability distribution is an acceptable approximation because: np = 30(.72) = 21.6 > 5 and n(1 - p) = 30(.28) = 8.4 > 5.
Example 6.1 (Sampling Distribution of ) = .72 Anderson, Sweeney, and Williams
P(.67 < p < .77) = P(-.61<z<.61) Area = ? 0.77 0.67 0.72 Anderson, Sweeney, and Williams 0 .61 -.61 z
Using the standard normal probability table we find that the area under the normal curve between the mean and .61 equals .2291. Thus the area under the curve between -.61 and .61 equals 2*.2291 = .4582. The probability is .4582 that the sample proportion will be within +/-.05 of the actual population proportion.