260 likes | 460 Views
Redefining the Unit Nonresponse Adjustment Cells for the Survey of Residential Alterations and Repairs (SORAR). Laura T. Ozcoskun and Katherine Jenny Thompson Presented By Samson Adeshiyan. Outline. Background The Problem The Authors’ Recipe for a Solution
E N D
Redefining the Unit Nonresponse Adjustment Cells for the Survey of Residential Alterations and Repairs (SORAR) Laura T. Ozcoskun and Katherine Jenny Thompson Presented By Samson Adeshiyan
Outline • Background • The Problem • The Authors’ Recipe for a Solution • Some Empirical Results Interspersed
Survey of Residential Alterations and Repairs (SORAR) Background • Monthly data collection • Low unit response rates • Key item: Total Expenditures • Maintenance and Repairs • Improvements • Multi-stage sample of Housing Units (HUs) • Privately-owned vacant HUs (Vacant) • Rental and 5+ unit properties (Rental) • Modified Half-Sample Variance Estimator
The Problem (Motivation) • SORAR’s three-stage weighting procedure • Duplication control (field subsampling) • Unit non-response adjustment • Post-stratification adjustment • Suspected that variables used to define unit nonresponse weighting cells not highly related to • Response propensity or • Cell means
Response Model • “Quasi-Randomization” (Oh & Scheuren 1983) • Covariate dependent, missing-at-random (MAR) response mechanism • Response propensity (p) is a random variable. • Minimum requirements for weighting cells: • Heterogeneous response propensities or • Heterogeneous cell means • Optimal adjustment cells satisfy both conditions.
The Authors’ Recipe • Determine Eligible Sets of Classification Variables • Determine Uncollapsed Cells (Full Model) • Logistic Regression Analysis • Determine Collapsed Cells (Reduced Model) • General Linear Hypothesis Tests • Relative Efficiency Diagnostic (MSE Ratios) • Time Series Plots of Adjustment Factors
Step 1: Find Sets of Classification Variables for Cells • Respondent requirements per cell: • Actual Cell Size 5 • needed for logistic regression • Effective “Sample” (cell) Size 5 • Categorical variables
Cell Sizes • Effective “Sample” (Cell) Size • rp is the Actual cell size of cell p • DEFFp is the design effect for item Y in cell p • indicates efficient design for item Y
Candidate Cells (SORAR) • Candidate cell variables (categorical) • Region (currently used) • Metropolitan Statistical Area (MSA) status (currently used) • Tenure (Vacant/Rental) • Single-unit vs. Multi-unit • Candidate cross classifications • Region/MSA Status/Single or Multi-Unit • Region/Tenure/Single or Multi-Unit
Step 2: Uncollapsed Cells (Full Model) • Response Propensity Modeling • Logistic Regression • Complex survey adaptations of Roberts, Rao, and Kumar (1987) to test statistics • Full and reduced (nested) models • Want all effects to be significant in full model • Would like to reject majority of nested models
Logistic Regression (SORAR) • 18 months • Separate full and reduced models for each month • Between-cell covariance approximations • = 0 (anti-conservative) • = -0.25 • = -0.50 (conservative)
Model 1: Region/MSA/Single or Multi-Unit Very sensitive to correlation assumptions Indicates necessity of including Single/Multi-Unit in weighting cells Region and MSA less necessary given Single/Multi-Unit
Model 2: Region/Tenure/Single or Multi-Unit Insensitive to correlation assumptions (change) Indicates necessity of including Single/Multi-Unit in weighting cells (unchanged) Region and Tenure often necessary (change)
Step 3: Collapsed Cells (Reduced Model) • General Linear Hypothesis Tests • Relative Efficiency Diagnostic • Time Series Plots of Estimated Nonresponse Adjustment Factors
General Linear Hypothesis Test Hypothesis Tests • H0: and (collapse rows) • H0: and (collapse columns) Not done with SORAR (cell estimates too variable)
Relative Efficiency DiagnosticMSE Ratios • Modified from Eltinge and Yanasaneh (1997) • Definitions approximately model-unbiased estimate under full model model-biased estimate under a collapsed weighting procedure (under model assumption) • Mean squared error ratio:
SORAR MSE Ratios: Total Expenditures • Tenure dropped: Median RH = 1.02 • HU Category dropped: Median RT = 0.93 • On average, RH is both greater than one and closer to one than RT • Not terrifically compelling evidence for either collapsing • How can values be less than 1? • Function of using empirical data • Collapsed variances smaller or equivalent to uncollapsed variances • Estimated bias often “negligible”
Time Series Plots of Adjustment Factors • Visual, less statistical • Fewer assumptions • Full procedure and collapsed procedure adjustment factors • Within region (SORAR) • Inverse of response propensities (SORAR)
Candidate Cells: Region by Single/Multi for Vacant Properties • Original adjustment factors very different in scale • Collapsed factors are far from both original factors
Candidate Cells: Region by Single/Multi for Rental Properties • Original adjustment factors very different in scale • Collapsed factors are far from both original factors (c.f. multi-unit factors)
Candidate Cells: Region by Tenure for Single-Unit Properties • Scale of original factors “similar” (compared to earlier slide) • Collapsed factors different for single units
Candidate Cells: Region by Tenure for Multi-Unit Properties • Scale of original factors similar • Collapsed factors similar to original factors
Final Recommendation (SORAR) • Full weighting cells • Region/Tenure/Single or Multi-Unit • Collapsed weighting cells • Region/Single or Multi-Unit • Region
Conclusion • Started with a recipe • Model-development tools • Diagnostic tools • Modified the recipe for our survey • Considered and dropped diagnostics (data-based) • Ended up with a new main course • More statistically defensible unit nonresponse adjustment cells.
Any Questions? • Laura Ozcoskun Laura.T.Ozcoskun@census.gov • Katherine Jenny Thompson Katherine.J.Thompson@census.gov