Sampling

Sampling MICS3 Regional Workshop “Survey Design”

MICS Sample Design • MICS is a complex survey (Multi-stage stratified). • MICS is a worldwide program, consistence & comparability are important issues. • We will discuss only a few of the highlights including: • Sample size determination • Stratification and sample allocation • Number of Primary Sampling Units and cluster sizes • Use of existing sample or new sample • A few special topics

Sample Size for MICS • Most important feature of MICS with respect to survey costs. We will discuss: • DETERMINANTS – factors, constraints • INDICATORS to use • FORMULA to calculate sample size

Determinants of Sample Size(Factors and Constraints) • Sample size (households) depends on many factors: • Expected size estimate of indicators • Expected size estimate of target population(s) • Average household size • Margin of error wanted • Level of confidence wanted • “Design effect” (increase in sample error due to use of cluster survey instead of simple random sample) • Expected non-response rate • Number of clusters or PSUs • Cluster size (number of households per sample cluster) • Number of sub-national areas for separate estimates (domains) • Survey budget and implementing capability

MICS Recommendations on Sample Size Determinants FACTOR RECOMMENDATION 1.Expected size estimate of indicators (next slide) 2.Expected size estimate of target population 12-23 mos [3%] 3.Average household size 6 persons 4.Relative margin of error wanted 12% of coverage rate 5.Level of confidence wanted 95 percent 6.Design effect in cluster surveys 1.5 7.Expected non-response rate 10 percent 8.Number of clusters or PSUs - minimum [300-400] 9.Cluster size [15-35] 10.Number of estimation “domains” wanted [5 or fewer] 11.Survey budget (country specific) For items 2, 3, 6, 7 use available country data (recent survey or census); if not available, use value above.

Indicators for Sample Size Determination • Sample size is different for each MICS indicator. • Must choose a key indicator, since only one sample size can be used in MICS. • Recommendations for choosing key indicator: • Choose from among main indicators of interest in your country. • Choose the one which will yield largest sample size. • Usually for a single-year age group, and • Usually DPT, measles, polio or tuberculosis immunization - or birth weight below 2.5 kg • Exceptions: Do not choose infant or maternal mortality rates as the key indicators. Do not choose a low coverage indicator that is desirably low (such as malnutrition prevalence). Do not choose breast-feeding indicators for 4-month age groups.

Checklist for Target Group and Indicator • To decide on the appropriate target group and indicator that you need to determine your sample size: • 1. Pick children 12-23 months old - the target population that comprises the smallest percentage of the total population – probably about 3 percent. • 2. For that target group, pick the lowest from among the following coverage rates: - DPT immunization level - Measles immunization level - Polio immunization level - Tuberculosis immunization level • 3. Do not pick from the desirably low coverage indicators that is already acceptably low.

Formula for Sample Size • Different formula than MICS2000 • MICS2005 formula emphasizes relative margin of error* instead of 5% absolute error (high coverage indicator) or 3% for low coverage indicator. • Less confusing • Does not depend on high or low coverage * The Relative Margin of Error is the percentage of tolerable difference that the estimated proportion can differ from its true value with a given confidence level. It determines the relative length of the confidence interval.

Formula • n = [4 (r) (1 - r) (deff) (1.1)] / [(.12r )2(p)(ave-size)] where • n is the required sample size, expressed as number of households, for the KEY indicator • 4 is factor to achieve 95 percent level of confidence, • r is anticipated prevalence (coverage) rate for key indicator, • 1.1 is factor to raise sample size by 10 percent for potential nonresponse, • deff is shortened symbol for design effect, • 0.12r is margin of error to be tolerated, defined as 12 percent of r (12 percent thus represents the relative sampling error of r), • p is proportion of total population that smallest group comprises, and • ave-size is average household size. You may use the table on the next page instead of formula if all conditions are satisfied for that table in your country.

Sample Size (Households) Calculation for Proportion Estimation Using Smallest Target Population

Example 1 • Target group: Children 12 to 23 months old • Percent of population: 3 percent • Key indicator: DPT immunization coverage • Prevalence (Coverage): 30 percent • Deff: No information • Non-response: No information • Average household size: 6 • Checking table => n = 5941

Checklist for Use of Sample Size formula • The formula to determine your sample size : n = [4 (r) (1 - r) (f) (1.1)] / [(.12r)2 (p) (nh)]. Use it if any (one or more) of the following applies in your country: • p – the proportion of one-year-old children is other than 3% • nh – the average household size is less than 4.5 persons or greater than 6.5 • r – the coverage rate of your key indicator is under 20 or over 40 percent • f - the sample design effect for your key indicator is different from 1.5, according to accepted estimates from other surveys in your country • your anticipated non-response rate is more or less than 10 percent.

Example 2 • Target group: Children 12 to 23 months old • Percent of population: 3.5 percent • Key indicator: DPT immunization coverage • Prevalence (Coverage): 25 percent • Deff: 1.6 • Non-response adjustment = 1.05 (response rate 95%) • Average household size: 6 • n = [4 (.25) (.75) (1.6) (1.05)] / [(.12*.25)2 (.035) (6)] = 1.26/.000189 = 6667.

Stratification & Sample Allocation • Stratification is the process of regrouping similar PSUs into sub-groups (strata). • Effects: better precision, flexible design, small sub-population coverage (or over sampling). • How to do stratification? (region) X (residence type) • Sample allocation: proportional, power allocation, equal size allocation (if budget is too tight). • Implicit stratification: sort the sampling frame according to certain characters such as regions, urban-rural residence, sub-regions, districts, etc.., then select a pps sample. • There is no unique rule for stratification, it depends on country situation

Number of PSUs and Cluster Size • Survey costs depend not only on number of households but their distribution among Primary Sampling Units (PSUs). • In general, the more PSUs the better for reliability but the greater the cost (usually travel costs). • We recommend 300 to 400 PSUs or more. • Number of PSUs also depends on cluster size. • Cluster size should be as small as practical for reliability. • Example: 8000 households selected in 400 PSUs of 20 households each is much more reliable sample than 200 PSUs of 40 each, but more expensive.

MICS Sampling Option 1 • USE AN EXISTING SAMPLE • Piggy-back MICS onto DHS or other survey if timely and feasible. • Or, use sample from a previous survey and re-interview households for MICS. • Or, use old survey sample EAs and construct new listing of households to select for MICS. • Old sample must be probability-based, national in scope. • Possibilities – DHS, other national health survey, recent labour force survey • Possibilities – DHS, other national health survey, recent labour force or household expenditure surveys • Important: design parameters must be known (such as selection probability, stratification, etc..)

OPTION 1 - USE OF AN EXISTING SAMPLE, continued • Advantages of old sample • - cost savings • - maps available for interviewers • - design rigor • - simplicity • Limitations of old sample • - burden on respondents • - sample design may need modification • * sample size • * sub-national coverage • * number of PSUs or clusters • => Balance between loss and gain

MICS Sampling Option 2 • USE NEW SAMPLE WITH HOUSEHOLD LISTING OPERATION • Design new MICS sample based on prototype • Two stages with census as frame (see comprehensive discussion in Chapter 4 on frame construction and up-dating old frames) • Use of implicit stratification, systematic selection of census EAs at first stage with pps • Create standard segments (DHS approach) • List households in selected segments • Select households systematically from list • Interview only the selected households, no replacement will be allowed

OPTION 2 - NEW SAMPLE WITH HOUSEHOLD LISTING, continued • Advantages of option 2 • - simple design • - probability-based • - if possible self-weighting (national level) • Limitations of option 2 • - expense of listing households • - time necessary to list households • [Example, sample size of 5000 households may need 25000 to 50000 households to be listed.]

DHS Method - Option 2 • Create “standard” segments. • Divide census population in each EA by 500 to determine number of standard segments. • Map sketch segments in each EA. • Choose 1 segment at random. • List households in selected segment only (instead of entire EA). • Purpose is to reduce listing workload to a manageable size.

MICS Sampling Option 3 • USE NEW SAMPLE WITHOUT HOUSEHOLD LISTING OPERATION • (Modified Segment, or Cluster, Design) • Design new MICS sample based on prototype. • Two stages with census as frame • Use of implicit stratification, systematic selection of census EAs at first stage with pps • Pre-determine number of segments based on desired cluster size. • Map sketch segments in each EA. • Choose 1 segment at random. • Interview all households in selected segment

OPTION 3 - NEW SAMPLE WITHOUT HOUSEHOLD LISTING, continued • Illustration: • Suppose desired cluster size is 20 households. • Suppose first sample EA contains 112 census households (according to frame). • Divide 112 by 20 = 5.6 (round to 6). • Map sketch exactly 6 segments based on canvass of EA. • Select one segment at random. • Interview all households (no matter how many are currently in the selected segment).

OPTION 3 - NEW SAMPLE WITHOUT HOUSEHOLD LISTING, continued • Advantages of option 3 • avoids listing completely • probability-based • self-weighting (national level) • Limitations of option 3 • less reliable than option 2 (households are “clustered” together in compact segments) • segmentation itself can be time-consuming and complicated • difficult to control sample size

Special Topics • Sub-national estimates, domains • Water and sanitation estimates • Survey weighting, sampling errors • Other – sample frame construction, selection techniques • Country examples

Sub-national Estimates, Domains • Number of separate areas (domains) for which separate, equally reliable estimates are wanted affects sample size. • If, say, 5 regional estimates are wanted, then, theoretically, sample should be increased by factor of 5. • Must be careful therefore in producing separate estimates for domains. • Either limit number of domains to avoid large increase in sample size, • Or be prepared to accept domain estimates with much higher sampling errors than national.

Water and Sanitation Estimates • These are an important component of MICS. • Sampling errors will be high, however (extremely high in some cases). • MICS sample is design primarily for person variables rather than household variables such as water/sanitation. • Sample design effects for water and sanitation indicators will be much higher than for other indicators. • Consequently, sampling reliability is very low. • Estimates can nevertheless be useful to estimate trends in water/sanitation if previous surveys exist upon which to make comparison.

Survey Weighting and Sampling Errors • All analysis based on survey data must apply survey weights in order to prevent biased results. • Survey weighting is design-specific. Non-response must be taken into account. • Formulas for calculating weights depend on the exact sample design used in each country.

Sampling Error Estimation • Calculation of sampling errors necessary to evaluate reliability of survey estimates • Should be done for 30-50 important indicators • Methodology is complex and design-specific • There are several options for sampling error calculations: • May use existing software (Clusters, WesVar, CenVar, PCCarp, etc.) • Latest version of SPSS currently evaluated whether new routines on sampling error are appropriate for MICS3 surveys • Routines in CSPro can be used • Or use simple, variance spreadsheet that will be available on the MICS website, www.childinfo.org

Sampling Error Estimation, continued • With spreadsheet, only necessary to enter: • Survey weights for each cluster • Unweighted indicator estimate for each cluster • Sampling error automatically calculated • Confidence limits, design effect automatically calculated

Other Topics • Other key information to be included in the MICS3 manual for the sampling statistician to review: • Sample frame construction • When new sample is used for MICS • Especially important if frame is old • Selection techniques • Details of systematic sampling • PPS sampling (probability proportionate to size) • Country examples from MICS2000 • Papua New Guinea, Lebanon, Angola

Sampling

Sampling

Presentation Transcript

Sampling

Sampling

Sampling

Sampling and Sampling Distributions

Sampling Design Sampling Procedures

SAMPLING

Sampling

Sampling

Sampling...

Sampling

Sampling Designs Systematic Sampling Cluster Sampling Multistage Sampling

Sampling

Sampling

Sampling and Sampling Distributions

Sampling

Sampling

Sampling

Sampling dan Distribusi Sampling()

SAMPLING

Sampling

Sampling

Sampling and Sampling Distributions