1 / 38

Lecture 5

Lecture 5. 16 March 2018 Recap . from last week , Poststratification Other sampling designs: Poisson , Pareto and Systematic sampling Variance estimation Two stage , Cluster and Two phase sampling. Poststratification. Poststratification to reduce nonresponse bias.

duncant
Download Presentation

Lecture 5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 5 16 March 2018 • Recap. from last week, Poststratification • Other sampling designs: Poisson, Pareto and Systematic sampling • Varianceestimation • Twostage, Cluster and Twophase sampling

  2. Poststratification

  3. Poststratification to reduce nonresponse bias • Poststratification is sometimes used to correct for nonresponse • Choose strata with different response rates • Poststratification amounts to assuming that the response sample in poststratum g is representative for the nonresponse group in the sample from poststratum g

  4. Zi = 1 if and Zi = 0 otherwise • Let • be N independent realizations of a unif(0,1) random variable Poisson design • Consider once again a sample membership indicator • Let the probability thatZi = 1 be general piand keep the independence between Zi by using the same simple list sequential scheme as we had in Bernoulli sampling. • Element i is now selected if

  5. Poisson design • Consider a design in where each unit is selected into the sample independently and with probabilities pi. We have the same unbiased estimator of a total i.e. • But now our pi :s vary and suppose they are proportional to the y:s

  6. Example Consider the following population xi 20 40 80 80 140 140 140 160 pi 0.1 0.2 0.4 0.4 0.7 0.7 0.7 0.8

  7. Poisson design To find the variance of the estimator above, remember that generally where Zi = 1 if and = 0 otherwise and when i = j and when ij. Thus

  8. Poisson design In the Poisson design we have, when i = j and when ij we have hence

  9. Poisson design How shall we estimate ?? In this case it is simpler since the independence implies that we only need first order inclusion probabilities and we get.

  10. Fixed-size pps designs • But why have a random sample size? • For an arbitrary sample size it is difficult to find a sampling scheme that has: • Simple selection of the sample • Possibility of (at least approximately) unbiased estimation of design variances • Admit the use of Permanent Random Numbers (PRN)

  11. Pareto sampling • For every element iU compute target inclusion probabilities • Generate N independent uniform random numbers (0,1) and compute ranking variables • The units with the n smallest Qi then constitutes the sample s

  12. Estimation in Pareto sampling If n/N or is not too close to 1

  13. Systematic Sampling Assume a full list of the population is available.Suppose that you wish to draw a sample of size n from a population of size N. You choose the sample by selecting a random digit between 1 and inclusive, then you choose every ath observation thereafter. This is called a systematic sample. It is, as we shall see, a special case of a one-stage cluster sample where each possible sample is a cluster.

  14. Assume a population of 12 units, 1 2 3 4 5 6 7 8 9 10 11 12 To take a sample of size 3, chose a random number between 1 and 4 (a=12/3). Draw that element and every fourth thereafter. The possible samples are, {1 5 9} {2 6 10} {3 7 11} {4 8 12}

  15. If N is a multiple of a, every unit has probability (Small problems when there are a few extra units) Systematic Sampling The problem with a systematic sample chosen in this way is that the sample consists of a single cluster, and therefore it is impossible to compute the variance estimates using our usual method.(Many second-order inclusion probabilities are zero)

  16. Systematic Sampling Advantages:Simple, especially with manual frames/registers.If intra sampling variability is high (more accurate than SRS). Problem: Cycled behaviour, unbiased estimation of sampling errors is impossible with only one starting point.

  17. Systematic sampling, an example

  18. Systematic sampling and Intracluster variation

  19. Systematic sampling and Intracluster variation BEST

  20. Systematic sampling and Intracluster variation Worst

  21. Systematic sampling Note that the variance of the estimator depends entirely on the variability between the ’clusters’ created by the systematic sampling procedure. If population is in random order (all N! permutations are equally likely): then systematic sampling is similar to SRS Systematic sampling can be very bad if y has periodic variation

  22. Systematic sampling • What about the case when N/n is not an integer? • N=na+c, when the starting value r  c we get n+1 units otherwise we get n units in the sample. • Another method is the following, • Generate a random number  from a unif (0,a) distribution. • The selected sample will then consist of those elements i for which,

  23. An identity The total sum of squares SSTO can be written in two parts

  24. Efficiency of systematic sampling Let be the variance of the HT-estimator in systematic sampling and let be the variance in SRS then. we see that if and Then systematic sampling is more efficient than SRS

  25. Cluster sampling and multistage sampling • Sampling designs so far: Direct sampling of the units in a single stage of sampling • Of economial and practical reasons: may be necessary to modify these sampling designs • There exists no population frame (register: list of all units in the population), and it is impossible or very costly to produce such a register. • The population units are scattered over a wide area, and a direct sample will also be widely scattered. In case of personal interviews, the traveling costs would be very high and it would not be possible to visit the whole sample

  26. Modified sampling can be done by • Selecting the sample indirectly in groups , called clusters; cluster sampling • Population is grouped into clusters • Sample is obtained by selecting a sample of clusters and observing all units within the clusters • Selecting the sample in several stages; multistage sampling

  27. In two-stage sampling: • Population is grouped into primary sampling units (PSU) • Stage 1: A sample of PSUs • Stage 2: For each PSU in the sample at stage 1, we take a sample of population units, now also called secondary sampling units (SSU) • Ex: PSUs are often geographical regions

  28. Examples • Cluster sampling. Want a sample of high school students in a certain area, to investigate smoking and alcohol use. If a list of high school classes is available,we can then select a sample of high school classes and give the questionaire to every student in the selected classes; cluster sampling with high school class being the clusters • Two-stage cluster sampling. If a list of classes is not available, we can first select high schools, then classes and finally all students in the selected classes. Then we have 2-stage cluster sample. • PSU = high school • SSU = classes • Units = students

  29. Two-stage sampling • Basic justification: With homogeneous clusters and a given budget, it is inefficient to survey all units in the cluster- can instead select more clusters • Populations partioned into N primary sampling units (PSU) • Stage 1: Select a sample sI of PSUs • Stage 2: For each selected PSU i in sI: Select a sample si of units (secondary sampling units, SSU) • The cluster totals ti must be estimated from the sample

  30. General two-stage sampling plan:

  31. Suggested estimator for population total t : Unbiased estimator

  32. The first component expresses the sampling uncertainty on stage 1, since we are selecting a sample of PSU’s. It is the variance of the HT-estimator with tias observations • The second component is stage 2 variance and tells us how well we are able to estimate each ti in the whole population • The second component is often negligible because of little variability within the clusters

  33. Unequal cluster sizes. PPS – SRS sampling • In social surveys: good reasons to have equal inclusion probabilities (self-weighting sample) for all units in the population (similar representation to all domains) • Stage 1: Select PSUs with probability proportional to size Mi • Stage 2: SRS (or systematic sample) of SSUs • Such that sample is self-weighting mi = m/n = equal sample sizes in all selected PSUs

  34. Two-Phase Sampling • Sampling followed by subsampling. • Double Sampling

  35. Sample of n clusters (psu’s) of size Mi Population of N clusters Independent samples of ssu’s of size mifrom each psu Two-Phase Sampling It is NOT to be mixed up with two-stage sampling (which is a special case)

  36. Two-Phase Sampling Situations where two-phase sampling is useful: • When you want to use auxiliary information in your design or in your estimator but the auxiliary information is not known in advance. • Example: • ty= total timber volume that has been cut in a forest, which can be defined as the total timber volume for the population of truckloads. • Gain in precision by using • where x = weight and y = timber volume

  37. Two-Phase Sampling • When adjusting for nonresponse. • When sampling is done at successive occasions. • A population is sampled repeatedly at two or more occasions and the same parameter is of interest, e.g. the employment rate. • Parameters of level and parameters of change are of interest. • When sampling on two occasions, two independent samples are selected at occasion II; one from the sample included at occasion I and one from the part that were not in the sample at occasion I.

  38. Round-up Survey strategy Sampling designs With or without replacement Fix or variable sample size Equal or unequal inclusion prob. (uses auxiliary info.) Estimation Horvitz-Thompson or π estimator Ratio or (Greg) estimator (uses auxiliary info.) Practical techniques Stratification and Domains Two phase, Two stage (clusters) It is the combination of practical circumstances and goals that determines which mix of the above is the ’best’ Survey strategy.

More Related