1 / 75

Sampling Methodology

Sampling Methodology. Intermediate Training in Quantitative Analysis Bangkok 19-23 November 2007. Some materials are modified from the presentation ‘Comprehensive Survey Design’, Bradley A. Woodruff, CDC. Topics to be covered in this presentation. Basic Introduction

Download Presentation

Sampling Methodology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SamplingMethodology Intermediate Training in Quantitative Analysis Bangkok 19-23 November 2007 Some materials are modified from the presentation ‘Comprehensive Survey Design’, Bradley A. Woodruff, CDC

  2. Topics to be covered in this presentation • Basic Introduction • Bias and error, accuracy and precision • Calculating sample size • Sampling Methodologies • Final Exercise

  3. Learning objectives By the end of this session, the participant should be able to: • Differentiate between precision and accuracy, bias and error • Calculate sample size • Understand different sampling methodologies

  4. Starting point • Define objectives of the survey: • Specific indicators to be measured (food sec, nutr) • Target groups (displaced hhs) • Population groups or geographic areas to be included/studied in survey • Must also determine the level(s) at which to survey (the unit of analysis) • Community • Household (most common for CFSVAs) • Children under 5 years of age

  5. Survey Starting point cont. • Must clearly define geographic area to be surveyed • Defines population to which results can be generalized • May be defined by: • Area in which a programme has been implemented or is planned • An easily defined political unit: district, province, country • Combination of units: rural areas in a province, • Other units, such as livelihood zones, agro-ecological zones, etc.

  6. What is a cross-sectional survey? A cross-sectional survey is a collection of data from a specific population at a single point in time. CFSVAs and EFSAs are typically cross-sectional surveys Often referred to as a ‘snapshot in time’ Sometimes referred to as a population survey. (FSMS is typically a longitudinal survey)

  7. What is sampling? Sampling is the process of selecting a number of subjects (a “sample population”) from all the subjects in a “target population” or “universe.” Source: Last. A Dictionary of Epidemiology

  8. Two sampling methods

  9. Why use probability sampling?? • To estimate/ measure certain outcomes (prevalence of child malnutrition, food insecurity, etc) for a larger population by measuring only a sub-set of that population • Without probability sampling, a correct estimate for the larger population could only be attained by measuring the entire population We will focus exclusively on probability sampling methods

  10. Bias and Error,Accuracy and Precision

  11. Bias and error Non sampling bias Sampling bias Sampling error Bias Sampling error

  12. Bias introduced into the survey that is not related to your sampling methodologies/ sample schemes Always present to some extent and immeasurable Examples: Sampling frame out of date/ do not have accurate population numbers/ households locations; non response to certain modules of the questionnaire for whatever reason; measurement error- child ages and weights not recorded correctly Non-sampling bias

  13. Sampling bias Bias that is introduced by inadequate sampling methodologies • Almost impossible to measure • Examples: • Non representative sampling • Failure to weight

  14. Sampling error Difference between survey result and population value due to random selection of sample Measurable and can be accounted for Example: 15% GAM rate in survey population but 10% GAM rates in the overall population (error of 5%) • Sampling error is influenced by: • Sample size • Sampling scheme • The spread of the indicator we want to measure

  15. Sampling error Measures of sampling error: • Confidence limits • Standard error • Coefficient of variation • Probability values (P values) • Others Use these measures to: • Calculate sample size prior to sampling • Determine how sure we are of result after analysis

  16. Bias and error need to be understood within the context of two other terms… Accuracy: • The degree to which a measurement, or an estimate based on measurements, represents the true value of the attribute that is being measured How close the sample pop estimate is to true pop value Precision: • Precision corresponds to the reduction of random error. How close are the sample pop estimates if the survey is repeated A measurement can be precise (low random error) but still inaccurate (because of a systematic bias): give examples

  17. Survey 1 Survey 2 Survey 3 Real population value Accuracy: obtaining results close to truth Driven by whether the instrument accurately measures what is intended; whether the pop measured is representative of true pop; etc (whether there is bias)

  18. Precision: obtaining similar results with repeated measurement Driven by sample size (error in the sample)

  19. How do bias and error relate to these terms? • Bias (both sampling and non sampling) affects accuracy • Sampling Error affects precision and precision can be controlled through sample size

  20. Explain survey estimates in terms of each of these terms

  21. And this??

  22. And???

  23. Finally, this???

  24. Calculating sample size

  25. Calculate sample size Sample size calculation determines the number of individuals that need to be interviewed in order to properly estimate information for a larger population Why calculate sample size? • Collecting data is expensive • Collecting data and specimens is inconvenient for subjects • Collecting data takes time.

  26. Calculate sample size To estimate sample size for single survey, need to know: • Estimate of the prevalence of the outcome (% food insecure hhs, % of wasted children, etc.) • Precision desired • Size of total population • Level of confidence (always use 95%)

  27. Calculate sample size To calculate sample size for estimate of prevalence with 95% confidence limit: N = 1.962 x (P)(1-P) • d2 • 1.96 = Z value for p = 0.05 or 95% confidence intervals • (1.64= Z value for p=0.10 or 90% confidence intervals) • P = Estimated prevalence • d = Desired precision (for example, 0.08 for ± 8%)

  28. Precision and sample size

  29. Calculate sample size Where to get information to make assumption about prevalence? • Prior surveys • Qualitative estimates • Wild guesses • Err toward an assumed prevalence of 50% when calculating sample size.

  30. Estimated prevalence and sample size

  31. What about sample size for a Cluster survey? SRS: Systematic Random Sampling

  32. Design effect Generally speaking, design effect households in the same village are often similar to each other (there is an intra-cluster correlation). Twenty households from two villages will not tell us as much about the entire population as twenty households all coming from different villages. The higher the intra-cluster correlation and the more households come from the same cluster, the higher the design effect. Example by chance we have 2 villages of predominantly fisherfolk"

  33. It is good to be familiar with these formulas, but we have computers to help us with the calculations….

  34. Sample size calculators • Epi Info (www.cdc.gov) • ODAN stat calculator Excel worksheet

  35. Sampling methodologies • Simple random sample • Systematic random sample • Cluster sample • Stratified sample • Complex sampling designs

  36. Simple random sampling (SRS) • Most basic type of sampling • Statistical theory based on SRS • Calculate p values and confidence limits • Output from most statistical computer programs assume a SRS • Selection of people is independent and random

  37. Advantages and disadvantages of SRS No selection bias Self-weighting Requires knowledge of population Costly to survey when population is spread out Sampling frame may not be available or complete

  38. Steps for conducting a simple random sample • Create list of all the sampling units • Number each unit consecutively • Randomly select numbers between 1 and the total number of sampling units • Random number table- Computer generated (RAND in Excel) and pick the highest numbers • “Pick a number from a hat” • Birth day or serial number on paper money • Flip a coin, roll a die, pick a card, pull a straw

  39. Simple random sampling Number 1 2 3 4 5 6 7 8 9 0 Household Smith Pfeiffer Anderson Timmer Huff Hunt Parvanta Grummer-Strawn Bobrow Cooper Random number table 7648 2352 6959 1937 2554 6804 9098 4316 4318 2346 7276 1880 7136 9603 0163 3152 7000 2865 8357 4475 9804 0042 1106 7949 2932 9958 9582 2235 1140 1164 7841 1688 4097 8995 5030 1785 5420 0125 4953 1332 5540 6278 1584 4392 3258 1374 1617 7427

  40. Simple random sampling Number 1 2 3 4 5 6 7 8 9 0 Household Smith Pfeiffer Anderson Timmer Huff Hunt Parvanta Grummer-Strawn Bobrow Cooper Random number table 76482352 6959 1937 2554 6804 9098 4316 4318 2346 7276 1880 7136 9603 0163 3152 7000 2865 8357 4475 9804 0042 1106 7949 2932 9958 9582 2235 1140 1164 7841 1688 4097 8995 5030 1785 5420 0125 4953 1332 5540 6278 1584 4392 3258 1374 1617 7427

  41. Systematic random sampling • Similar to simple random sampling • First person chosen randomly • Systematic selection of subsequent people • Statistics same as simple random sampling

  42. Steps for systematic random sample • List the sampling units • Divide the number of sampling units by the sample size to determine sampling fraction • Choose random number between 1 and sampling fraction • Identifies the first selected sampling unit • Add the sampling fraction to the random number to identify the second selected sampling unit • Continue to add the sampling interval until end of list

  43. Systematic sample example • Example: A survey was undertaken to assess household livelihoods in one community of 480 houses. Sample size calculations revealed that 40 households would need to be sampled systematically to be representative the larger community. • Sampling interval= 480/40 • Random number between 1-12 was chosen (7) • First house sampled= 7 • Subsequent households sampled • 7 + 12= 19 • 19+12=31 • 31+12=43 • Etc • Danger: unknown, hidden patterns in the population could bias the sample

  44. What is required for both simple and systematic random sampling? Both require a complete list of all basic sampling units arranged in some order. Resources have to be adequate to sample throughout the target population

  45. What if there is no household listing?? What if the area of the target population is to widespread for available resources??

  46. Cluster Sampling!!

  47. Cluster sampling Definition: Probability sampling in which sampling units at some point in the selection process are collections, or clusters, of population elements Source: Kalsbeek, Introduction to survey sampling

  48. Cluster sampling Objective: To choose smaller geographic areas in which simple or systematic random sampling can be done Cluster sampling, for our purpose, are multistage (usually 2 or 3 stages)

  49. Selected households Non-selected households Cluster Sampling- Illustration Simple random sampling (30 households) Sampling universe

More Related