1 / 58

Survey sampling

Survey sampling. Sampling & non-sampling error Bias Simple sampling methods Sampling terminology Cluster sampling Design effect Stratified sampling Sampling weights. Why sample?. To make an inference about a population Studying entire pop is impractical or impossible.

bona
Download Presentation

Survey sampling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Survey sampling • Sampling & non-sampling error • Bias • Simple sampling methods • Sampling terminology • Cluster sampling • Design effect • Stratified sampling • Sampling weights

  2. Why sample? • To make an inference about a population • Studying entire pop is impractical or impossible

  3. Example of sampling • Estimate the proportion of adults, ages 18-65, in Port Elizabeth that have type 2 diabetes • Select a sample from which to estimate the proportion • Population: adults aged 18-65 living in Port Elizabeth • Inference: proportion with type 2 diabetes

  4. Probability sampling • Each individual has known (non-zero) probability of selection • Precision of estimates can be quantified

  5. Non-probability sampling • Cheaper, more convenient • Quality of estimates cannot be assessed • May not be representative of population

  6. Sampling errorv.Non-sampling error

  7. Sampling error • Random variability in sample estimates that arises out of the randomness of the sample selection process • Precision can be quantified (estimation of standard errors, confidence intervals)

  8. Non-sampling error • Estimation error that arises from sources other than random variation • non-response • undercoverage of survey • poorly-trained interviewers • non-truthful answers • non-probability sampling • This type of error is a bias

  9. What is bias? We want to estimate the mean weight of all women aged 15-44 living in Coopersville. Suppose there are 50,000 such women and the true mean weight is 61.7 kg. We select a sample of 200 such women and interview them, asking each woman what her weight is. The sample mean weight is 59.4 kg. Is our estimate biased?

  10. Bias Suppose we could repeat the survey many, many times. Then we compute the mean of all the sample means. Say the mean of the means = 62.9 Bias = (mean of means) - (true mean) = 62.9 - 61.7 = 1.2 kg

  11. Unbiased estimation If . . . (mean of the means) = (true mean) then the bias is zero, and we say that the estimator is unbiased. The “mean of the means” is called the “expected value” of the estimator.

  12. Simple sampling methods • Task: Select a sample of n individuals or items from a population of N individuals or items • Common methods • simple random sampling • systematic sampling

  13. Simple sampling methods • Simple random sampling (SRS) • each item in population is equally likely to be selected • each combination of n items is equally likely to be selected • Systematic sampling (typical method) • randomly select a starting point • select every kth item thereafter

  14. Systematic sampling example • Stack of 213 hospital admission forms; select a sample of 15 • 213/15 = 14.2  Select every 14th form • Starting point: random number between 1 and 14 (we choose 11) • First form selected is 11th from top • Second form selected is 25th from top (11 + 14 = 25) • Third form selected is 39th from top (11 + 2x14 = 39) • And so forth . . .

  15. Systematic sampling, continued • What is the probability that the 146th form will be selected? The 195th? • Does this qualify as a simple random sample? Why or why not? • Is there any potential problem arising from the use of systematic sampling in this situation?

  16. Example was typical quick method • In the preceding example, we selected every 14th form • Ideally, we would select every 14.2th form (see later example on 2-stage sample of nurses) • Example is a quick and easy method, commonly used in the field; it is a good approximation to the more rigorous procedure

  17. Systematic sampling: + and - • Advantages of systematic sampling • typically simpler to implement than SRS • can provide a more uniform coverage • Potential disadvantage of systematic sampling • can produce a bias if there is a systematic pattern in the sequence of items from which the sample is selected

  18. Role of simple sampling methods • These simple sampling methods are necessary components of more complex sampling methods: • cluster sampling • stratified sampling • We’ll discuss these more complex methods next (following some definitions)

  19. Definitions • Listing units (or enumeration units) • the lowest level sampled units (e.g., households or individuals) • PSUs (primary sampling units) • the first units sampled (e.g., states or regions) • Sampling probability • for any unit eligible to be sampled, the probability that the unit is selected in the sample

  20. More definitions • EPSEM sampling • “equal probability of selection method”, thus a method in which each listing unit has the same sampling probability • Sampling frame • the set of items from which sampling is done--often a list of items.

  21. More definitions • Undercoverage: the degree to which we fail to identify all eligible units in the population • incomplete lists • incomplete or incorrect eligibility information

  22. Still more definitions • Non-response: failure to interview sampled listing units (study subjects) • refusal • death • physician refusal • inability to locate subject • unavailability

  23. Still more definitions • Precision: the amount of random error in an estimate • often measured by the width or half-width of the confidence interval • standard error is another measure of precision • estimates with smaller standard error or narrower CI are said to be more precise

  24. CLUSTER SAMPLINGsingle stage

  25. Clusters • Subsets of the listing units in the population • Set of clusters must be mutually exclusive and collectively exhaustive • counties • townships • regions • institutions

  26. ExampleSingle-stage cluster sampling • There are 361 nurses working at the 31 hospitals and clinics in Region 4 • We wish to interview a sample of these nurses • select a simple random sample of 5 hospitals/clinics • interview all nurses employed at the 5 selected institutions

  27. Assessing the example • Hospitals/clinics are the PSUs • Nurses are the listing units • Sampling probability for each nurse is 5/31 • Thus, this is an EPSEM sample • Sampling frame is the list of 31 hospitals and clinics

  28. CLUSTER SAMPLINGtwo stage

  29. Cluster sampling -- two stage • Select a sample of clusters, as in the single-stage method • From each selected cluster, select a subsample of listing units

  30. Cluster sampling -- two stage • It is always nice to do EPSEM sampling because such samples are self-weighting • don’t need sampling weights in analysis • A common EPSEM method for two-stage sampling is PPS (probability proportional to size)

  31. PPS sampling • The key to the method is that the sampling probabilities of clusters in the first stage are proportional to the “sizes” of the clusters • size = number of listing units in cluster • At stage 2, select the same number of listing units from each selected cluster

  32. Nurse example revisitedTwo-stage sampling • We want to interview a sample of 36 nurses • We can afford to visit 9 different hospitals/clinics • Thus, we need to interview 36/9 = 4 nurses at each institution

  33. Nurse example revisitedTwo-stage sampling • Stage 1: select a sample of 9 hospitals/clinics • Selection prob. proportional to “size” • Stage 2: select a sample of 4 nurses from each selected institution • At each stage, use one of the simple sampling methods

  34. Nurse example revisitedTwo-stage sampling • PSUs are the hospitals/clinics • Listing units are the nurses • Sampling frames • Stage 1: List of 31 hospitals/clinics • Stage 2: Lists of nurses at each selected hospital/clinic

  35. Selecting 2-stage nurse sample • Sampling interval, I = 361/9 = 40.1 • Starting point, random number between 1 and 40; we choose R = 14 • First sampling number = R = 14 • 2nd sampling number = 14 + 1x40.1 = 54.1 • 3rd sampling number = 14 + 2x40.1 = 94.2 • We have selected institutions 2, 5, 9, . . .

  36. Two-stage nurse sample

  37. Applying the sampling numbers • For each sampling number, choose the first unit with cumulative “size” equal to or greater than the sampling number • Example: sampling number 54.1 • first unit with cumulative size  54.1 is unit 5 (cum. no. of nurses = 57) • so we select unit 5 for the sample

  38. Optional challenge What is the selection probability for institution 1? 12/40.1 = 0.299 What is the selection probability for a nurse in institution 1? (12/40.1) x (4/12) = 0.998 = 36/361 What is the selection probability for a nurse in institution 2? (7/40.1) x (4/7) = 0.998 = 36/361 All nurses have the same selection probability.

  39. Why do cluster sampling instead Of a simple sampling method? • Advantages • reduced logistical costs (e.g., travel) • list of all 361 nurses may not be available (reduces listing labor) • Disadvantages • estimates are less precise • analysis is more complicated (requires special software)

  40. Design effect • Relative increase in variance of an estimate due to the sampling design • “variance” = (standard error)2 • Formula • s1 = standard error under simple random sampling • s2 = standard error under complex sampling design (e.g., cluster sampling) • design effect = (s2/s1)2

  41. Design effect for cluster sampling • For cluster sampling designs, the design effect is always >1 • This means that estimates from a survey done with cluster sampling are less precise than corresponding estimates obtained from a survey having the same sample size done with simple random sampling

  42. Cluster sizes • Recommended “take” per cluster is 20-40 for multi-purpose surveys • Time and resource limitations will often dictate the maximum number of clusters you can include in the study • Including more clusters improves the precision of your estimates more than a corresponding increase in sample size within the clusters already in the sample

  43. STRATIFIEDSAMPLING

  44. Strata • Subsets of the listing units in the population • Set of strata must be mutually exclusive and collectively exhaustive • Strata are often based on demographic variables • age • sex • race

  45. Stratified sampling • Sample from each stratum • Often, sampling probabilities vary across strata

  46. Stratified sampling • Advantages • guarantees coverage across strata • can over-sample some strata in order to obtain precise within-stratum estimates • typically, design effect < 1 • Disadvantages • with unequal sampling probabilities, sampling weights must be included in analysis • more complicated • requires special software

  47. Example: sampling breast cancer cases for the Women’s CARE Study • Stratification variables • geographic site • race (2 races) • five-year age group • Over-sampled younger women • Over-sampled black women

  48. Example: Sampling households for a reproductive health survey in 11 refugee camps in Pakistan • Selected simple random sample of households from within each of the 11 camps • All households were selected with the same probability

  49. Refugee camp sampling

  50. The sampling operation • Must be carefully controlled • don’t leave to discretion in the field • use a carefully defined procedure • Document what you did • for reference during analysis • to defend your study

More Related