170 likes | 369 Views
Sampling for EHES. Principles and Guidelines. Johan Heldal & Susie Cooper Statistics Norway. Overview. Why this kind of sampling? Target population & sample size Sampling frames. Probability sampling Two-stage sampling - PSUs Stratification Stage 1 sampling Sample sizes
E N D
Sampling for EHES Principles and Guidelines Johan Heldal & Susie Cooper Statistics Norway
Overview • Why this kind of sampling? • Target population & sample size • Sampling frames. • Probability sampling • Two-stage sampling - PSUs • Stratification • Stage 1 sampling • Sample sizes • Sampling PSUs with PPS • Stage 2 sampling • A cost model • Age-gender stratification • Further aspects
Why? • Goals for EHES: • To estimate distribution of risk levels within national populations. • To compare risk levels among national populations. • To predict levels of disease in the future. • Different from ordinary goals for epidemiologists: to establish risk factors and models for risk.
Ideal Target Population • Core: All persons 25-64 years at a given date with permanent residence in a country. • Can be extended by age to 18+. • Should also include institutionalized. • Sample size: At least 500 in each of (M,W) x (25-34, 35-44, 45-54, 55-64): • Total ≥ 4000 persons. • For pilot ≥ 200 persons.
Main Sampling Frame • List of persons/addresses from which to take a sample (register or census). • Should cover the target population but may need ”adds-on”. • ”adds-on”: List of institutions • A good list frame may be unavailable. • Can use ”Map frames” (NHANES). • Telephone directories may be complicated.
Probability sampling • Sampling in scientific surveys is carried out as Probability Sampling (e.g. simple random sampling) • Every sampling unit and every target unit has a defined probability of being selected. • It must be possible to calculate this probability at least for all units being sampled.
Two stage sampling • PrimarySamplingUnit: Area that can be handled by one examination site. • Small enough that every person living there can easily travel to the site. • Or be easily visited. • Can be created from small census tracts, municipalities, electoral districts, post code areas or … . • Divide the country into disjoint PSUs.
Two stage sampling • Stratification: Group the PSUs into groups of ”close PSUs”, Strata. • Use geography and other known information to group similar PSUs together. • Stage 1: Take a probabilitysampleof PSUs in each stratum. • Stage 2:Then take a probability sample of persons/-households/-addresses in each sampled PSU.
Strata consists of PSUs • PSU sizes: • Ni = # persons, households, addresses of PSU no. i. • Can vary, but not too much within a stratum. • Recommended Ni≥ 1000. • Stratum size: N = N1+ … + NM • A sample of m ≥ 2 PSUs and n persons or addresses, is taken from the stratum.
Stage 1 sampling • Selection probabilities for PSUs i : πi = mNi/N (PPS sampling) • Each PSU gets the same sample size p = n/m (persons, addresses). • Gives every person in the same stratum equal probability of being selected. • m and p can be calculated in a cost-variance optimal way in each stratum. • The program EHESsampling takes care of the calculations and performs sampling.
Stage 2 sampling • Sampling of persons or addresses within each of the PSUs sampled at stage 1. • Simple random sampling of p = n/m (persons, addresses) in every sampled PSU.
A cost model C1 = cost of establishing an extra PSU C2 = cost of inviting an extra person to the PSU. Total variable cost budget model C = C1m + C2n m and p = n/m can be calculated to minimize variance given the size of this budget. EHESsampling can do this.
Age-gender stratification • At stage 2: Sample separately for each of the eight (M,W) x 4 age domains. • An option only if the main sampling frame consists of individual persons. • Gives better control of sample size within each age-gender domain. • Not necessary if sampling size very large.
With address frames • Address: • A dwelling or • A house with many dwellings • Dwelling: Invite all eligible persons in the dwelling, if not too many • Sample some dwellings at the address with a Kish grid. (Stage 3) Then do as in 1.
Time and place aspects • A HES takes time (say a year). • Avoid confoundation between time of year and geography. • A randomized design for the order of visiting the PSUs recommended. • Simpler to handle if many teams work in parallel.