911 likes | 2.26k Views
Ch 4: Stratified Random Sampling (STS). DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called strata, and then selecting a random sample from each stratum. Procedure.
E N D
Ch 4: Stratified Random Sampling (STS) • DEFN: A stratified random sample is obtained by separating the population units into non-overlapping groups, called strata, and then selecting a random sample from each stratum
Procedure • Divide sampling frame into mutually exclusive and exhaustive strata • Assign each SU to one and only one stratum • Select a random sample from each stratum • Select random sample from stratum 1 • Select random sample from stratum 2 • … Stratum H Stratum #1 h=1 h=2 . . . . . . h=H
Ag example • Divide 3078 counties into 4 strata corresponding to regions of the countries • Northeast (h = 1) • North central (h = 2) • South (h = 3) • West (h = 4) • Select a SRS from each stratum • In this example, stratum sample size is proportional to stratum population size • 300 is 9.75% of 3078 • Each stratum sample size is 9.75% of stratum population
Procedure – 2 • Need to have a stratum value for each SU in the frame • Minimum set of variables in sampling frame: SU id, stratum assignment
Procedure – 3 • Each stratum sample is selected independently of others • New set of random numbers for each stratum • Basis for deriving properties of estimators • Design within a stratum • For Ch 4, we will assume a SRS is selected within each stratum • Can use any probability design within a stratum • Sample designs do not need to be the same across strata
Uses for STS • To improve representativeness of sample • In SRS, can get ANY combination of n elements in the sample • In SYS, we severely restricted the set to k possible samples • Can get “bad” samples • Less likely to get unbalanced samples if frame is sorted using a variable correlated with Y
Uses for STS – 2 • To improve representativeness of sample - 2 • In STS, we also exclude samples • Explicitly choose strata to restrict possible samples • Improve chance of getting representative samples if use strata to encourage spread across variation in population
Uses for STS – 3 • To improve precision of estimates for population parameters • Achieved by creating strata so that • variation WITHIN stratum is small • variation AMONG strata is large • Uses same principal as “blocking” in experimental design • Improve precision of estimate for population parameter by obtaining precise estimates within each stratum
Uses for STS – 4 • To study specific subpopulations • Define strata to be subpopulations of interest • Examples • Male v. female • Racial/ethnic minorities • Geographic regions • Population density (rural v. urban) • College classification • Can establish sample size within each stratum to achieve desired precision level for estimates of subpopulations
Uses for STS – 5 • To assist in implementing operational aspects of survey • May wish to apply different sampling and data collection procedures for different groups • Agricultural surveys (sample designs) • Large farms in one stratum are selected using a list frame • Smaller farms belong to a second strata, and are selected using an area sample • Survey of employers (data collection methods) • Large firms: use mail survey because information is too voluminous to get over the phone • Small firms: telephone survey
Estimation strategy • Objective: estimate population total • Obtain estimates for each stratum • Estimate stratum population total • Use SRS estimator for stratum total • Estimate variance of estimator in each stratum • Use SRS estimator for variance of estimated stratum total • Pool estimates across strata • Sum stratum total estimates and variance estimates across strata • Variance formula justified by independence of samples across strata
Ag example – 5 • Estimated total farm acres in US
Ag example – 7 • Estimated variance for estimated total farm acres in US
Ag example – 8 • Compare with SRS estimates
Estimation strategy - 2 • Objective: estimate population mean • Divide estimated total by population size • OR equivalently, • Obtain estimates for each stratum • Estimate stratum mean with stratum sample mean • Pool estimates across strata • Use weighted average of stratum sample means with weights proportional to stratum sizes Nh
Ag example – 9 • Estimated mean farm acres / county
Ag example – 10 • Estimate variance of estimated mean farm acres / county
h=1 h=2 . . . . . . h=H Stratum 1 Notation Stratum H • Index set for stratum h = 1, 2, …, H • Uh = {1, 2, …, Nh } • Nh= number of OUs in stratum h in the population • Partition sample of size n across strata • nh = number of sample units from stratum h (fixed) • Sh = index set for sample belonging to stratum h
Notation – 2 • Population sizes • Nh= number of OUs in stratum h in the population • N = N1+ N2 + … + NH • Partition sample of size n across strata • nh = number of sample units from stratum h • n = n1+ n2 + … + nH • The stratum sample sizes are fixed • In domain estimation, they are random • For now, we will assume that the sampling unit (SU) is an observation unit (OU)
Notation – 3 • Response variable Yhj = characteristic of interest for OU j in stratum h • Population and stratum totals
Notation – 4 • Population and stratum means
Notation – 5 • Population stratum variance
Notation – 6 • SRS estimators for stratum parameters
STS estimators • For population total
STS estimators – 2 • For population mean
STS estimators – 3 • For population proportion
Properties • STS estimators are unbiased • Each estimate of stratum population mean or total is unbiased (from SRS)
Properties – 2 • Inclusion probability for SU j in stratum h • Definition in words: • Formula hj =
Properties – 3 • In general, for any stratification scheme, STS will provide a more precise estimate of the population parameters (mean, total, proportion) than SRS • For example • Confidence intervals • Same form (using z/2) • Different CLT
Sampling weights • Note that • Sampling weight for SU j in stratum h • A sampling weight is a measure of the number of units in populations represented by SU j in stratum h
Example • Note: weights for each OU within a stratum are the same
Example – 2 • Dataset from study
Sampling weights – 2 • For STS estimators presented in Ch 4, sampling weight is the inverse inclusion probability
Defining strata • Depends on purpose of stratification • Improved representativeness • Improved precision • Subpopulations estimates • Implementing operational aspects • If possible, use factors related to variation in characteristic of interest, Y • Geography, political boundaries, population density • Gender, ethnicity/race, ISU classification • Size or type of business • Remember • Stratum variable must be available for all OUs
Allocation strategies • Want to sample n units from the population • An allocation rule defines how n will be spread across the H strata and thus defines values for nh • Overview for estimating population parameters Special cases of optimal allocation
Allocation strategies – 2 • Focus is on estimating parameter for entire population • We’ll look at subpopulations later • Factors affecting allocation rule • Number of OUs in stratum • Data collection costs within strata • Within-stratum variance
Proportional allocation • Stratum sample size allocated in proportion to population size within stratum • Allocation rule
Proportional allocation – 2 • Proportional allocation rule implies • Sampling fraction for stratum h is constant across strata • Inclusion probability is constant for all SUs in population • Sampling weight for each unit is constant
Proportional allocation – 3 • STS with proportional allocation leads to a self-weighting sample • What is a self-weighting sample? • If whj has the same value for every OU in the sample, a sample is said to be self-weighting • Since each weight is the same, each sample unit represents the same number of units in the population • For self-weighting samples, estimator for population mean to sample mean • Estimator for variance does NOT necessarily reduce to SRS estimator for variance of
Proportional allocation – 4 • Check to see that a STS with proportional allocation generates a self-weighting sample • Is the sample weight whj is same for each OU? • Is estimator for population mean equal to the sample mean ? • What happens to the variance of ?
Ag example – 12 • Even though we have used proportional allocation, rounding in setting sample sizes can lead to unequal (but approximately equal) weights
Neyman allocation • Suppose within-stratum variances vary across strata • Stratum sample size allocated in proportion to • Population size within stratum Nh • Population standard deviation within stratum Sh • Allocation rule
Optimal allocation • Suppose data collection costs chvary across strata • Let C = total budget c0 = fixed costs (office rental, field manager) ch = cost per SU in stratum h (interviewer time, travel cost) • Express budget constraints asand determine nh
Optimal allocation – 2 • Assume general case: stratum population sizes, stratum variances, and stratum data collection costs vary across strata • Sample size is allocated to strata in proportion to • Stratum population size Nh • Stratum standard deviation Sh • Inverse square root of stratum data collection costs • Allocation rule