620 likes | 1.02k Views
QMS 6351 Statistics and Research Methods Chapter 7 Sampling and Sampling Distributions. Prof. Vera Adamchik. Chapter 7 Outline. Simple random sampling Point estimation Introduction to sampling distributions Sampling distribution of Sampling distribution of Other sampling methods.
E N D
QMS 6351Statistics and Research Methods Chapter 7Sampling andSampling Distributions Prof. Vera Adamchik
Chapter 7 Outline • Simple random sampling • Point estimation • Introduction to sampling distributions • Sampling distribution of • Sampling distribution of • Other sampling methods
Statistical inference • The purpose of statistical inferenceis to obtain information about a population from information contained in a sample. • A population is the set of all the elements of interest in a study. • A sample is a subset of the population.
A parameter is a numerical characteristic of a population. • A sample statistic is a numerical characteristic of a sample. • We will use a sample statistic in order to judge tentatively or approximately the value of the population parameter.
The sample results provide only estimates (that is, rough and approximate values) of the values of the population characteristics. • The reason is simply that the sample contains only a portion of the population. • With proper sampling methods, the sample results will provide “good” estimates of the population characteristics.
Selecting a sample • Sampling from a finite population. Finite populations are often defined by lists such as organization membership roster, class roster, inventory product numbers, etc. • Sampling from an infinite population (a process). The population is usually considered infinite if it involves an ongoing process that makes listing or counting every element impossible. For example, parts being manufactured on a production line, customers entering a store, etc.
Sampling from a finite population • A simple random sample from a finite populationof size N is a sample selected such that each possible sample of size n has the same probability of being selected. • Replacing each sampled element before selecting subsequent elements is called sampling with replacement. • Sampling without replacementis the procedure used most often. • In large sampling projects, computer-generated random numbersare often used to automate the sample selection process.
Example: St. Andrew’s College St. Andrew’s College received 900 applications for admission in the upcoming year from prospective students. The applicants were numbered, from 1 to 900, as their applications arrived. The Director of Admissions would like to select a simple random sample of 30 applicants.
Sampling from a finite population using Excel • =RAND() Excel generates a random number between 0 and 1 • =RAND()*N Excel generates a random number greater than or equal to 0 but less than or equal N • =INT(RAND()*900)
Sampling from an infinite population • In the case of infinite populations, it is impossible to obtain a list of all elements in the population. • The random number selection procedure cannot be used for infinite populations.
Sampling from an infinite population • A simple random sample from an infinite population is a sample selected such that the following conditions are satisfied: • Each element selected comes from the same population. • Each element is selected independently.
Point estimation In point estimationwe use the data from the sample to compute a value of a sample statistic that serves as an estimate of a population parameter. A point estimate is a statistic computed from a sample that gives a single value for the population parameter. An estimator is a rule or strategy for using the data to estimate the parameter.
Terminology of point estimation • We refer to as the point estimator of the population mean. • We refer to as the point estimator of the population standard deviation.
Terminology of point estimation • We refer to as the point estimator of the population proportionp. • The actual numerical value obtained for in a particular sample is called the point estimate of the parameter.
Example: St. Andrew’s College • Recall that St. Andrew’s College received 900 applications from prospective students. The application form contains a variety of information including the individual’s scholastic aptitude test (SAT) score and whether or not the individual desires on-campus housing. • At a meeting in a few hours, the Director of Admissions would like to announce the average SAT score and the proportion of applicants that want to live on campus, for the population of 900 applicants.
Example: St. Andrew’s College • However, the necessary data on the applicants have not yet been entered in the college’s computerized database. So, the Director decides to estimate the values of the population parameters of interest based on sample statistics. The sample of 30 applicants selected earlier with Excel’s RAND() function will be used.
A C D Applicant SAT On-Campus Number Score Housing 1 2 12 1107 No 3 773 1043 Yes 4 408 991 Yes 5 58 1008 No 6 116 1127 Yes 7 185 982 Yes 8 510 1163 Yes 9 1008 No 394 Point estimation using Excel Excel Value Worksheet Note: Rows 10-31 are not shown.
Point estimates Note:Different random numbers would have identified a different sample which would have resulted in different point estimates.
Population parameters Once all the data for the 900 applicants were entered in the college’s database, the values of the population parameters of interest were calculated.
= Sample mean SAT score = Sample pro- portion wanting campus housing Summary of point estimates obtained from a simple random sample Population Parameter Parameter Value Point Estimator Point Estimate m = Population mean SAT score 990 997 80 s = Sample std. deviation for SAT score 75.2 s = Population std. deviation for SAT score .72 .68 p = Population pro- portion wanting campus housing
Making inferences about a population mean
The value of is used to make inferences about the value of m. The sample data provide a value for the sample mean . A simple random sample of n elements is selected from the population. Population with mean m = ? Making inferences about a population mean
Population vs sampling distribution • The population distribution is the probability distribution derived from the information on all elements of a population. • The probability distribution of a sample statistic ( ) is called its sampling distribution.
Sampling distribution of The sampling distribution of the sample mean ( ) is the probability distribution of all possible values of . We need to know: • Expected value of • Standard deviation of • Form of the sampling distribution of
x x Mean of the sampling distribution of The mean of the sampling distribution of is equal to the mean of the population. Thus,
x is the finite population correction factor Standard deviation of the sampling distribution of (1) Infinite population (N is unknown); (2) Finite population and n/N 0.05 Finite population and n/N 0.05 is referred to as the standard error of the mean.
x x Two important observations 1. The spread of the sampling distribution of is smaller than the spread of the corresponding population distribution. In other words, . 2. The standard deviation of the sampling distribution of decreases as the sample size increases.
x Form of the sampling distribution of 1. The population has a normal distribution. If the population from which the samples are drawn is normally distributed, then the sampling distribution of the sample mean will also be normally distributed for any sample size.
x Form of the sampling distribution of 2.The population is not normally distributed but the sample size is large (n 30). According to the Central Limit Theorem, for a large sample size (n 30), the sampling distribution of the sample mean is approximately normal, irrespective of the shape of the population distribution. In cases where the population is highly skewed or outliers are present, samples of size 50 or more may be needed.
x Form of the sampling distribution of 3.The sample size is small (n < 30) and the population is not normally distributed. Use special statistical procedures.
Sampling distribution of Sampling Distribution of for SAT Scores x Example: St. Andrew’s College
P(980 << 1000) = ? Example: St. Andrew’s College • What is the probability that the sample mean will be between 980 and 1000? In other words, what is the probability that a simple random sample of 30 applicants will provide an estimate of the population mean SAT score that is within +/-10 of the actual population mean ?
Sampling Distribution of for SAT Scores Example: St. Andrew’s College Area = .5034 980 990 1000
Example: St. Andrew’s College • The probability of 0.5034 means that, for alarge number of samplesof size 30 selected from the population, we can expect that in 50.34% of all cases the sample mean will be within +/-10 of the actual population mean (that is, 980-1000) and in 49.66% of all cases the sample mean will be further than +/-10 of the actual population mean (that is, below 980 or above 1000).
Relationship between the sample size and the sampling distribution of Example: St. Andrew’s College • Suppose we select a simple random sample of 100 applicants instead of the 30 originally considered.
regardless of the sample size. In our example, E( ) remains at 990. • Whenever the sample size is increased, the standard error of the mean is decreased. With the increase in the sample size to n = 100, the standard error of the mean is decreased from 14.6 to:
With n = 100, With n = 30, Relationship between the sample size and the sampling distribution of Example: St. Andrew’s College
Example: St. Andrew’s College • Recall that when n = 30, P(980 << 1000) = .5034. • Now, with n = 100, P(980 << 1000) = .7888. • Because the sampling distribution with n = 100 has a smaller standard error, the values of have less variability and tend to be closer to the population mean than the values of with n = 30.
Sampling Distribution of for SAT Scores Example: St. Andrew’s College Area = .7888 980 990 1000
Example: St. Andrew’s College • The probability of 0.7888 means that, for alarge number of samplesof size100 selected from the population, we can expect that in 78.88% of all cases the sample mean will be within +/-10 of the actual population mean (that is, 980-1000) and in 21.12% of all cases the sample mean will be further than +/-10 of the actual population mean (that is, below 980 or above 1000).
Making inferences about a population proportion
The sample data provide a value for the sample proportion . The value of is used to make inferences about the value ofp. Making inferences about a population proportion A simple random sample of n elements is selected from the population. Population with proportion p= ?
Sampling distribution of The sampling distribution of the sample proportion ( ) is the probability distribution of all possible values of . We need to know: • Expected value of • Standard deviation of • Form of the sampling distribution of
Mean of the sampling distribution of The mean of the sampling distribution of is equal to the population proportion. Thus,
Standard deviation of the sampling distribution of (1) Infinite population (N is unknown); (2) Finite population and n/N 0.05 Finite population and n/N 0.05 is referred to as the standard error of the proportion.
Form of the sampling distribution of The sampling distribution of can be approximated by a normal probability distribution whenever the sample size is large. The sample size is considered large whenever the following two conditions are satisfied:
Form of the sampling distribution of • For values of p near .50, sample sizes as small as 10 permit a normal approximation. • With very small (approaching 0) or very large (approaching 1) values of p, much larger samples are needed.
Example: St. Andrew’s College For our example, with n = 30 and p = .72, the normal distribution is an acceptable approximation because: np = 30(.72) = 21.6 > 5 and n(1 - p) = 30(.28) = 8.4 > 5