270 likes | 1.02k Views
Cluster Sampling. Module 3 Session 8. Purpose of the session. To demonstrate how a cluster sample is selected in practice To demonstrate how parameters are estimated under cluster sampling We do this for clusters of same size and clusters of different sizes.
E N D
Cluster Sampling Module 3 Session 8
Purpose of the session • To demonstrate how a cluster sample is selected in practice • To demonstrate how parameters are estimated under cluster sampling • We do this for clusters of same size and clusters of different sizes. • The practicalities of cluster sampling is also discussed.
Introduction - Simple random sampling not always appropriate! Example • Population of N=324 households • Households arranged into 36 “villages” of 9 households each • Costly to travel between villages • Cheap to travel between households in a village Taking a SRS of n=27 households is a “costly” strategy
Cluster sampling Example (cont.) • Each village is a primary sampling unit (PSU) • Each household in a village is a secondary sampling unit (SSU) • Take a sample of villages • Sample all households within the selected villages • This is one-stage cluster sampling.
Cluster sampling • Cluster sampling is useful: • Structure of the units is hierarchical (e.g. villages and households within villages) • Sampling frame may not exist at SSU level (may only exist at PSU level) • Cost • e.g. in example, cluster sampling is cheaper than SRS for same sampling effort.
Illustration: Estimation • Cluster sampling: 3 villages out of 36 selected using SRS. • Income from sale of goods recorded for each household, and totalled up for village. 230 360 180 Estimates: Mean village income is 256.7 Total income for area is 9240
In practice… • Units in a cluster tend to be more similar to each other and different to units in other clusters • Cluster sampling often leads to less precise estimates than SRS (opposite concept to stratification) • Trade-off between convenience and precision: • If cluster sampling cheap to do, could take larger sample to help improve precision.
Selecting the PSUs • In this first (unrealistic) example, the villages all have the same number of households, hence we select villages using simple random sampling • In general the PSUs (villages) may not have the same number of SSUs (households). Might then want to select PSUs using • Probability proportional to size. • gives large PSUs a greater probability of occurring in the sample than a small PSU
PPS Sampling (with replacement) Example: M=8 Villages (PSUs) of different sizes. Want to sample 3 of them (m=3). • Assume interest is still in income from sale of goods (recorded for households and totalled for each village). • Larger villages are likely to have higher incomes, and smaller villages lower incomes.
PPS sampling (cont) • 240 households (SSUs) in the population arranged in the villages as follows: PSU (e.g. village no.) 1 2 3 4 5 6 7 8 SSUs (e.g. no. of h’holds) 10 10 20 20 40 40 50 50 • Probability of village being selected (pi ) is: PSU 1 2 3 4 5 6 7 8 pi 1/24 1/24 1/12 1/12 1/6 1/6 5/24 5/24
PPS sampling (cont) Step 1: Calculate the cumulative sum of the SSUs PSU 1 2 3 4 5 6 7 8 Sum 10 20 40 60 100 140 190 240 Step 2: Draw a number at random from 1,2,…240 • This determines which village is selected e.g. 48 would be in Village 4, and 190 in Village 7.
PPS sampling (cont) Step 3: Replace number and repeat to select other villages Three numbers may be 33, 174, 137 to give Villages 3, 7 and 6 Step 4: Sample all households in the selected villages The calculation of estimated total income for the area then weights according to the size of the village.