STT 350: SURVEY SAMPLING Dr. Cuixian Chen

Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow STT 350:SURVEY SAMPLINGDr. Cuixian Chen Chapter 4: Simple Random Sampling (SRS)

SRS • SRS – Every sample of size n drawn from a population of size N has the same chance of being selected. • Use table of random numbers (A.2) or computer software. • Using the table: • Assign every sampling unit a digit • Use table of random numbers to select sample

Example • In a population of N = 450, select a sample of size 10 using the table of random digits. • Assume: Table A.2 and use the second column; if we drop the last two digits of each number. (Note: we neglect the repeated numbers Without replacement) • Starting digit value_______ • Ending digit value_______ • Line number started at _______ • Sample digits selected for sample:

Review: Section 3.3 Summarizing Info in Populations and Samples: The Finite Population Case • If the population is infinitely large, we can assume sampling without replacement (probabilities of selecting observations are independent) • However, if population is finite, then probabilities of selecting elements will change as more elements are selected (Example: rolling a die versus selecting cards from standard 52 card deck) Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Review: section 3.3 Sampling distribution from infinite populations • For randomly selected samples from infinite populations, mathematical properties of expected value can be used to derive the facts that: • It can also be shown that the variance of the sample mean can be estimated unbiasedly by: Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Estimating population average with SRSchosen w/o replacement in Finite Population (N) • We use (Syi/n) to estimate m ( is an unbiased estimator of m) • For Infinite population….(or say N extremely large) • We use s2 to estimate s2 (unbiased estimator) • From Chap3, we know that V( ) = s2/n • But for Finite population, this is more complicated.

Estimating population average with SRSchosen w/o replacement in Finite Population (N) Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Estimating population average with SRSchosen w/o replacement in Finite Population (N) • We use (Syi/n) to estimate m ( is an unbiased estimator of m) • We use s2 to estimate s2 (unbiased estimator) • From previous, we know that V( ) = s2/n (infinite population….or extremely large) • If finite population, then V( ) = ( (N-n)/(N-1)) (s2/n) • When we replace s2 by s2, this becomes estimated variance of y-bar = (1-(n/N))(s2/n)

Bound on the error of estimation for with SRS chosen w/o replacement in Finite Population (N) • For C=95%, use 2 standard errors as our bound (think of MOE), we have 2sqrt( (1-(n/N))(s2/n)). • When can the finite population correction (fpc)be dropped? A good rule of thumb is when (1-n/N) > 0.95 • Want data to be approximately normal (sometimes transformations can be used…..the log transformation is one of the most popular transformations)

Estimating population average with SRSchosen w/o replacement in Finite Population (N) Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Eg4.2, page 85 Example 4.1 • a) Estimate mu; • b) For C=95%, find the bound on the error of estimation for mu; • c) Find 95% CI for mu. Example 4.2 Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

EX4.16, page 104 • a) Estimate mu; • b) For C=95%, find the bound on the error of estimation for mu; • c) Find 95% CI for mu. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Estimating population total using SRSchosen w/o replacement in Finite Population (N) • Since a SRS assumes all observations have an equally likely chance to be selected, we set di to be di = n/N • We use t-hat to estimate t ( =Syi/di =N*y-bar is an unbiased estimator of t) • Therefore, for finite population, V() = N2( (N-n)/(N-1)) (s2/n) • When we replace s2 by s2, this becomes • estimated variance of = N2(1-(n/N))(s2/n)

Estimating population total using SRS chosen w/o replacement in Finite Population (N)

Bound on the error of estimation • For C=95%, use 2 standard errors as our bound (think of MOE), we have 2sqrt( N2(1-(n/N))(s2/n)) • Normality is still important here!! (transform if necessary….i.e. small sample size and skewed data)

Eg4.4, page 87 • a) Estimate the Total t; • b) For C=95%, find the bound on the error of estimation for t; • c) Find 95% CI for the Total t. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Eg4.4, R codes n=50 N=750 ybar=10.31 s2=2.25 tau_hat=N*ybar B=2*sqrt(N^2*(1-n/N)*(s2/n)) print(B) tau_hat-B tau_hat+B Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

EX4.17, page 104 • a) Estimate the Total t; • b) For C=95%, find the bound on the error of estimation for t; • c) Find 95% CI for the Total t. From above Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Selecting Sample Size for m, using SRSchosen w/o replacement in Finite Population (N) • Use variance of y-bar: V(y-bar) = ( (N-n)/(N-1)) (s2/n). • Set B = 2sqrt(V(y-bar)), which is B = 2sqrt(( (N-n)/(N-1)) (s2/n) ) and solve for n. ….which yields n = (Ns2)/((N-1)D+s2) where D=B2/4 • Since s2 is usually not known, estimate it with s2 (or s is approximately range/4)

Selecting Sample Size for m, using SRSchosen w/o replacement in Finite Population (N) Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Eg4.5, page 89 • a) Find sample size n. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Eg4.5, R codes B=3 N=1000 sigma=25 D=B^2/4 n=(N*sigma^2)/((N-1)*D+sigma^2) print(n) Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

EX4.23-4.24, page 106 • a) Based on the information from EX4.23, find sample size n for EX4.24. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

EX4.23-4.24, R codes n=20 N=200 ybar=2.1 s2=0.4^2 mu_hat=ybar B=2*sqrt((1-n/N)*(s2/n)) print(B) ################# B=1 N=200 sigma=1 D=B^2/4 n=(N*sigma^2)/((N-1)*D+sigma^2) print(n) Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Selecting Sample Size for t, using SRSchosen w/o replacement in Finite Population (N) • Set B = 2sqrt(N2V(y-bar)), which is B = 2sqrt(N2( (N-n)/(N-1)) (s2/n) ) and solve for n ….which yields n = (Ns2)/((N-1)D+s2) where D=B2/(4N2) • Since s2 is usually not known, estimate it with s2 (or s is approximately range/4)

Selecting Sample Size for t, using SRSchosen w/o replacement in Finite Population (N) Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Eg4.6, page 90 • a) Find sample size n. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

EX4.27-4.28, page 106 • a) Based on the information from EX4.27, find sample size n for EX4.28. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

4.5 Estimation of a Population Proportion using SRS chosen w/o replacement in Finite Population (N) • Eg: Are you Hispanic or not? (YES / NO) • Define yi as 0 (if unit does not have quantity of interest) and yi=1 (if unit does have quantity of interest) • Then p-hat = Syi/n. (special case of sample mean) • p-hat is an unbiased estimator of p. • Estimated variance of p-hat (for infinite sample sizes) is p-hat*q-hat/n • Estimated variance of p-hat (for finite sample sizes) is (1-n/N)(p-hat*q-hat)/(n-1), where q-hat= 1-p-hat • Bound = 2*sqrt(Estimated variance of p-hat) • Problem 4.14

4.5 Estimation of a Population Proportion using SRS chosen w/o replacement in Finite Population (N)

EX4.14, page 104 Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

EX4.14, R codes ## To find phat and B ## phat=25/30 N=300 qhat=1-phat n=30 B=2*sqrt((1-n/N)*(phat)*(qhat)/(n-1)) print(B) ## To find n ## B=0.05; N=300; D=B^2/4 phat=25/30 qhat=1-phat n=(N*(phat)*(qhat))/((N-1)*D+phat*qhat) print(n) Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

To estimate sample size • n = Npq/( (N-1)D + pq ) where D = B2/4 • If p is unknown, then we use p = 0.5 • Normality is important here!! • Problem 4.15 • Question: All the bounds that we have looked at so far assumes what level of confidence?

4.6 Comparing two Estimates Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

4.6 Comparing two Means • When comparing means, we consider only the independent-sample case because the dependent case becomes too complicated to handle at this level. With fpc: Without fpc: Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Eg 4.10, page 95 Data shown next page Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Eg 4.10, page 95 Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Eg 4.10, page 95 • (a) Is there sufficient evidence to conclude that the mean mercury level for lakes of type 1 differs from that for lakes of type 2? • (c) Is there sufficient evidence to conclude that the mean mercury level for lakes of type 2 differs from that for lakes of type 3? Without fpc: Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Eg4.10, R codes ybar1=0.22 ybar2=0.74 s1=0.103 s2=0.583 n1=4 n2=15 (ybar1-ybar2)-2*sqrt((s1^2/n1)+(s2^2/n2)) (ybar1-ybar2)+2*sqrt((s1^2/n1)+(s2^2/n2)) Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

EX4.18, page 104: comparing means Without fpc:

4.6 Comparing two Proportions • When comparing proportions, however, a commonly occurring dependent situation does have a rather simple solution. • For independent situation, covariance=0. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Eg4.11, page 98 Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Eg4.11, R codes phat1=0.44 phat2=0.08 n1=600 n2=200 (phat1-phat2)-2*sqrt((phat1)*(1-phat1)/n1+(phat2)*(1-phat2)/n2) (phat1-phat2)+2*sqrt((phat1)*(1-phat1)/n1+(phat2)*(1-phat2)/n2) phat1=0.44 phat2=0.52 n=600 (phat1-phat2)-2*sqrt((phat1)*(1-phat1)/n+(phat2)*(1-phat2)/n+2*(phat1*phat2/n)) (phat1-phat2)+2*sqrt((phat1)*(1-phat1)/n+(phat2)*(1-phat2)/n+2*(phat1*phat2/n)) Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

Sampling from Real Population, page 112 • Identify a problem in your own area of interest for which you can actually draw a simple random sample to estimate a population mean, total, or proportion. Clearly define the population and the sampling units and construct a frame. Select a simple random sample from the frame by using the random number table in Appendix A. Then collect the data and make the necessary calculations. • Some suggested projects are as follows. • Business: Estimate the average gross income for firms of a certain type in your area or the average amount spent for entertainment among college men. • Social sciences: Estimate the proportion of registered voters favoring some current political proposal or the average number of people per household for a certain section of your city.

Sampling from Real Population, page 112 • Some suggested projects are as follows. • Physical sciences: Consider a laboratory experiment such as measuring the tensile strength of wire or the diameter of a machined rod. Take n independent observations on such an experiment and treat them as a simple random sample. Construct an interval estimate of the “population” mean. Here the population is merely conceptual (we could take many measurements of the phenomenon in question), and its mean represents the average strength of wire of this type or the average diameter of the rod. • Biological sciences: Estimate the average weight of animals fed on a certain diet for a specified time period or the average height of trees in a certain plot. As an example of working with totals instead of means, estimate the total number of insect colonies (of a certain type) infesting a plot. Be careful here in selecting the sampling units and constructing the frame.

Extra Examples • A question asked to high school students was if they lied to a teacher at least one during the past year. The information is presented below Male Female Lied at least once Yes 3228 10295 No 9659 4620 Find the estimated difference in proportion for those who lied at least once to the teacher during the past year by gender. Place a bound on this estimated difference.* *Source: Moore, McCabe and Craig

Extra Multinomial example • If statistics are from a multinomial distribution, then cov(qhat1,qhat2) = (-p1p2/n) • In a class with 30 students, the table below illustrates the breakdown of class: Freshmen 10 Sophomore 5 Junior 7 Senior 8 Estimate the difference in percent Freshmen and percent Junior and place a bound on this difference.

4.6 Summary: Comparing two Means • When comparing means, we consider only the independent-sample case because the dependent case becomes too complicated to handle at this level. With fpc: Without fpc: Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

4.6 Summary: Comparing two estimates • When comparing proportions, however, a commonly occurring dependent situation does have a rather simple solution. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

4.6 Comparing two Estimates • When comparing means, we consider only the independent-sample case because the dependent case becomes too complicated to handle at this level. • When comparing proportions, however, a commonly occurring dependent situation does have a rather simple solution. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow

STT 350: SURVEY SAMPLING Dr. Cuixian Chen

STT 350: SURVEY SAMPLING Dr. Cuixian Chen

Presentation Transcript

Sampling

Survey Methodology Sampling, Part 2

Module G

Introduction to Sampling for the Implementation of PATs

The Survey Cycle

CHAPTER 7, the logic of sampling

Introduction to Survey Data Analysis

Non-sampling errors

Survey Research

EMR 6500: Survey Research

Survey Sampling - 2

Sampling (Part III)

EMR 6500: Survey Research

EMR 6500: Survey Research

Sampling

SAMPLING AND SAMPLING DISTRIBUTIONS

Note 8 : Sampling

Introduction to Survey Data Analysis

Survey design and sampling

Survey and Sampling Methods

2007 Household Travel Survey

CITY OF CLEARWATER Communications Survey