300 likes | 1.12k Views
Post-stratification. Sometimes there is an obvious stratification variable Don’t know stratum assignment for each SU can’t stratify Take a SRS, e.g. Know stratum totals, N h , which can be used to improve estimation relative to SRS estimators
E N D
Post-stratification • Sometimes there is an obvious stratification variable • Don’t know stratum assignment for each SU can’t stratify • Take a SRS, e.g. • Know stratum totals, Nh, which can be used to improve estimation relative to SRS estimators • Very common for household and population surveys • Census data provide number of persons, households per area, by age, …
Food spending example • Objective: estimate the average amount spent on food per week in NC • Possible stratification variable: household composition • Family households might be expected to have higher food bills than non-family households • Sampling frame • List of all households in NC • No information on household composition • From U.S. census data, the distribution of household composition is known
Food spending example – 2 • 2000 Census data on household composition in NC
Post-stratification – 2 • Design phase • SRS of n OUs (could be another design) • Identify poststrata • Sample selection phase • SRS of n Ous • After sampling, get n1 , n2 , …, nH - BUT can’t determine at this point • Data collection phase • Include a question that gathers information on stratum assignment • OU i belongs to poststratum h • Can determine values for n1 , n2 , …, nH • Note that values for nh are random – differ for each sample
Food spending example – 3 • Select SRS of n = 1000 households • Collect data on household composition • List each household member and relationship to respondent • Tabulate number of households for different size categories (nh) • Use Census 2000 population information on number of households for composition categories (Nh)
Post-stratification – 3 • Note that sample composition across post-strata is different from population composition • Consider percentage distribution across post-strata for population (column 3) and for sample (column 5) • Could improve estimates by “calibrating” to the post-stratum population totals – this is the basis for post-stratification estimator • Another way to look at the sample composition is to compare the expected sample size for post-strata with the observed sample size for post-strata obtained from the SRS • Expected sample size for post-stratum h
Post-stratification – 4 • Estimating a population mean • Domain estimation for means, then pool stratum estimates • Variance approximation (nh > 30, n large)
Food spending example – 6 • Food expenditures last week
Food spending example – 7 • Estimate population mean • Estimate SE of estimated mean
Post-stratification – 5 • Formulas involve weighted averages of stratum sample means and variances • Mean estimator looks like stratification estimator • Variance estimator is not the stratification variance estimator • Estimating a population total? • Estimating a population proportion?
Post-stratification – 6 • Estimator for population total • Weight under post-stratified estimator • whj = Nh /nh
Post-stratification and nonresponse • May get disproportionate allocation across poststrata because of differential stratum nonresponse rates • Same approach can be used to improve estimation by using ratio of post-stratum population size to total population in averaging estimates across post-strata
Implicit assumption • Sample post-stratum mean from responding units is an unbiased estimate of the population post-stratum mean • Distribution of Y for responding part of post-stratum population is (approximately) • Same as distribution for whole poststratum population • Same for the nonresponding poststratum population • Often a poor assumption