1 / 34

Unit 4: Sampling approaches

Unit 4: Sampling approaches. After completing this unit you should be able to:. Outline the purpose of sampling Understand key theoretical concepts in sampling Understand the need for more complex sampling designs Understand the main sampling issues and primary sampling options for BSS

miette
Download Presentation

Unit 4: Sampling approaches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unit 4: Sampling approaches

  2. After completing this unit you should be able to: • Outline the purpose of sampling • Understand key theoretical concepts in sampling • Understand the need for more complex sampling designs • Understand the main sampling issues and primary sampling options for BSS • Understand the criteria for choosing a sampling approach

  3. Why do we sample? • We sample when we desire to measure characteristics of a specified population (e.g., the proportion of the general population who have unsafe sex) but lack the time and resources to obtain information from all members of the population. • Concentrating survey time and resources on a sample may also result in better quality data than if resources were spread over the whole population.

  4. Key definitions • The target population is the population that is the ideal one for meeting a survey’s measurement objective. (For example, all commercial sex workers in a city.) • The survey population is the target population modified to take into account practical considerations (For example, all commercial sex workers in a city over the age of 15, excluding those who are home-based.)

  5. What do we want from our sample?

  6. Unbiased estimates of our indicators for the survey population This requires a random/probability sample. Use the class as an example. In summary: • A probability sample is one in which each person in the survey population has a known, non-zero probability of selection. • Statistical tests are based on the assumption that the sample is a probability sample. • A probability sample ensures that our sample is like, or can be weighted to be like, the population from which it was drawn, and the estimates of our indicators can be generalised to the larger population. • Probability sampling requires a sample frame, which is a list of ‘units’ from which a sample may be selected.

  7. Issue Probability Non - probability sample sample Prone to selection bias No Yes Can generalise results to survey population Yes No Can estimate precision of survey estimates (i.e., Yes No use statistical techniques) Results considered credible No Yes Requires sample frame Yes No Requires following fixed procedures that are Yes No sometimes costly or unfeasible Method replicable (important for measuring Yes No trends) A summary of probability and non-probability sampling

  8. 2. Precise estimates of our indicators for the survey population This requires an adequate sample size. In summary: • There are many possible samples that could be selected from the population. Because of chance, each sample would produce a different estimate. • In real life we only select one sample from the population. If we use probability sampling, we can estimate how precisely the population measure is estimated by the sample estimate. • We can increase the precision of our estimate by ensuring an adequate sample size. Standard equations are available to calculate sample size.

  9. Problems with simple random sampling

  10. Problem 1: Can require the selection of a large number of random numbers. Solution:Use systematic sampling (i.e., sample people at regular intervals down the sample frame). Problem 2: Sample frames for an entire target population rarely exist and are too impractical to construct. Solution:Develop a sampling frame of larger units (clusters). Randomly select clusters and construct a sample frame of individuals in the selected clusters. Randomly sample individuals within those clusters.

  11. Notes on cluster sampling 1. All members of the target population must be included in one of the clusters on the sample frame in order to have a chance of being selected. 2. If clusters are unequal sizes, we need to take this into account to ensure that our sample is not biased by the fact that people in smaller clusters have a higher probability of being selected than those in larger clusters. We can do this by: • making the probability that a cluster is sampled dependent on its size • adjusting for cluster size during the analysis.

  12. Notes on cluster sampling, cont. 3. Cluster sampling results in less precise estimates of our indicators than simple random sampling. As respondents within clusters may be similar to each other, we need to compensate for this by increasing the sample size.

  13. Problem 3:Populations can be spread over a wide area, making logistics difficult. Solution: Use cluster sampling, as it concentrates fieldwork in specific clusters. Problem 4:The population consists of distinct sub-groups that we are interested in. Solution:Make precise estimates for each sub-group (‘strata’) by using stratified sampling (i.e., take a sample of adequate size from each strata). If we want an estimate for the entire population, we can combine the estimates for the strata if we know the proportion of the population in each strata.

  14. Sampling issues in behavioural surveillance • Consistent sampling is required across survey rounds: If sampling changes between rounds, we don’t know if any observed changes are real or a result of changes in methodology. • General populations can rarely be used to access high-risk groups: Group members may not be found in households in sufficient numbers and may not want to talk in household settings. Instead, the locations where group members congregate can be defined as clusters.

  15. High-risk group Possible cluster Brothel-based sex workers Brothels Non-brothel-based sex workers Streets, bars, hotels, guesthouses Men who have sex with men Cruising sites Intravenous drug users Shooting galleries, injecting sites Truckers Loading/u nloading/halting points Migrants Households, workplaces Examples of possible clusters for high-risk groups

  16. Sampling issues for behavioural surveillance, cont. 3. Cluster sampling is difficult when clusters are not stable. • A measure of cluster size is needed for cluster sampling. It is difficult to estimate cluster size when we use locations like sex worker sites as clusters, because the people in each cluster are rarely fixed. • The risk behaviour in a cluster may also vary by time of day. This makes it difficult to select a sample that is representative of the entire target population using conventional cluster sampling.

  17. Sampling issues for behavioural surveillance, cont. 4. Members of high-risk groups may be difficult to identify and access. 5. Cluster sampling is impossible if group members do not congregate. Some groups do not congregate at all. In others, only some members of the population congregate and important sections of the group may be missed.

  18. Potential solutions to sampling challenges • Use different sampling strategies for different groups. • Use conventional sampling methods in unconventional ways. • Consider using experimental sampling techniques such as Respondent Driven Sampling (RDS).

  19. Sampling options for behavioural surveillance

  20. Conventional cluster sampling Appropriate for the general population, youth and a few high-risk groups, such as prisoners.

  21. Time location sampling • Usewhen high-risk groups congregate, but their clusters are not stable. • Allows locations to be included as clusters more than once (e.g., at different times of the day or on different days of the week). Clusters are defined by both location and time. • For example: Cluster 1= Site 1 weekday afternoon Cluster 2= Site 2 weekday evening Cluster 3= Site 1 weekend Cluster 4= Site 2 weekday afternoon Cluster 5= Site 1 weekday evening Cluster 6= Site 2 weekend

  22. Time location sampling, cont. • This means: • The fact the cluster size is not fixed is not a problem, as we only need to know the number of individuals associated with the cluster at the sampling time interval. • The fact that the type of person in the location varies by time is not a problem, as the location is included at different times.

  23. Respondent-Driven Sampling • Use when high-risk groups do not congregate • Steps: • Start with initial contacts or ‘seeds,’ who are surveyed and then become recruiters. • Each recruiter invites up to three people they know in the high-risk group to be interviewed. • The new recruits become the recruiters. • Five to six recruitment waves occur.

  24. Theory behind respondent-driven sampling • Given sufficiently long referral chains (five to six of the people you started with), the final sample will be like the network from which we recruit. • By keeping track of the links between recruiters and recruits and the size of people’s networks, we can calculate the probability of selection and estimate how precisely the population measure is estimated by the sample estimate.

  25. Criteria for choosing a sampling approach for behavioural surveillance Criterion Sampling Approach Is the sub - population of interest the Yes Cluster sampling general population or youth? No Does the sub - populat ion congregate in No RDS identifiable and accessible locations in high proportions? Yes Is creating a list of group members No TLS or RDS associated with each site feasible? Yes Are a high proportion of sub - No TLS or RDS population group members likely to be accessible at data collection sites on randomly chosen days/times? Yes Cluster sampling

  26. Sample size calculation The sample size can be based on the number of participants needed to detect a change in each round (or year) in the proportion of an indicator from one round to the next. [Z1- 2P (1-P) + Z1- P1 (1- P1) + P2 (1-P1)]2 (P2 – P1)2 Where: Z1-α = The z score for the desired confidence level Z1-β = The z score for the desired power P1 = The proportion of the sample reporting indicator in year 1 P2 = The proportion of the sample reporting indicator in year 2 P = (P1 + P2)/2 n= D

  27. Sample size calculation, cont. • D design effect. The design effect can be thought of as a correction factor for how much a cluster sample differs from a simple random sample. The design effect accounts for the similarities people have when they are sampled within the same cluster. • The bigger the D, the larger the sample size needed.

  28. Sample size calculation, cont. • P1 and P2. P1 and P2 are the measures of interest for which you wish to see a change between survey rounds. • The smaller the change you wish to detect, the larger the sample size you will need. • The closer P1 and P2 are to 50%, the larger the sample size you will need.

  29. Sample size calculation, cont. • Z1-α. The Z1-α score is a statistic that corresponds to the level of significance desired. • The smaller the significance level (i.e., higher confidence level), the larger the sample size you will need. • Z1-β. The Z1-β score is a statistic that corresponds to the power desired. • The higher the power, the larger the sample size you will need.

  30. Indicator level in wave 1 (P1) Indicator level in wave 2 (P2) Sample size needed each wave with a design effect of 1.25 Sample size needed each wave with a design effect of 2.0 .10 .20 .10 .25 247 395 .20 .30 123 197 .20 .35 363 581 .30 .40 171 274 .30 .45 441 706 .40 .50 201 322 .40 .55 480 768 .50 .60 214 343 .50 .65 480 768 .60 .70 210 336 .60 .75 441 706 .70 .80 188 301 .70 .85 363 581 .80 .90 149 239 .80 .95 247 93 395 149 Table 4.5. Pre-calculated sample size estimates

  31. Example of sample size calculation • Suppose you are planning a survey of sex workers using a two-stage cluster design. You wish to show that condom use will increase from 20% in the baseline survey (this year) to 30% or greater in the survey wave next year. How many sex workers do you need to include each year?

  32. Example of sample size calculation, cont. Solution: D = 2 (moderate) Z1-α =1.96 (95% confidence level) Z1-β = 0.83 (80% power) P1 = 20% condom use in year 1 P2 = 30% condom use in year 2 P = (.20 + .30)/2 = .25 N = 2 {1.96 SQT[2x.25(1 - .25)] + 0.83 SQT[.20(1-.20) + .30(1-.3))]}2/(.30 - .20) 2 = 582 sex workers per survey wave

  33. Small group discussion a. What sampling strategies have you had experience with? b. What difficulties and successes did you have with the strategy?

  34. Case study • For each of the following groups, decide what is the best sampling strategy. • Why this is the best strategy? • What are the strong and weak points of using this method for the group? a. Group 1: Youth b. Group 2: MSM

More Related