210 likes | 313 Views
Selection and Non-Response Bias: From a Simulation Study to the ‘Real World’ : A “Case Study” of Children, Families and Schools . Michelle Robinson Department of Sociology Interdisciplinary Training Program in Educational Sciences. Social Capital. test scores. school completion.
E N D
Selection and Non-Response Bias: From a Simulation Study to the ‘Real World’ : A “Case Study” of Children, Families and Schools Michelle Robinson Department of Sociology Interdisciplinary Training Program in Educational Sciences
Social Capital test scores school completion time on homework grades absenteeism
Causal Ambiguity OR Social Capital Child Outcomes
The Intervention • Families and Schools Together (FAST) • research-based after-school program • universally recruited 1st grade families • 8 weeks of weekly meetings at schools • 2 years of monthly meetings • designed to strengthen bonds • parents and school staff • parents and other parents • parents and children Social Capital
Research Design Phoenix San Antonio
Research Design Phoenix San Antonio FAST Control
Longitudinal Design • Cohort 2 • 1st graders • pre-survey • intervention • post-surveys • Cohort 1 • 3rd graders • follow-up surveys • district records • Cohort 1 • 1st graders • pre-survey • intervention • post-surveys • Cohort 2 • 3rd graders • follow-up surveys • district records Wrap-up 2008-09 2009-10 2010-11 2011-12 2012-13 Academic Year
Selection Bias • Successful Randomization at the cluster level ≠ successful randomization within cluster. • No pretreatment differences in recruitment • Pretreatment differences on survey items • Comparison group more advantaged • Complicates our causal model • Unobserved heterogeneity • Biased estimates • Can weighting help correct this bias?
Non-Response Bias • Non-response is also an issue of selection and can contribute to bias • Makes comparisons difficult • Are we comparing apples and apples or apples and bananas? • MCAR, MAR and NMAR • Are the respondents different from nonrespondents? • In ways that matters for our study?
Ordinary Least Squares • SRS assumes everyone has the same probability of selection (Normal Distribution) • Differences in the distribution are assumed to either be imposed or a result of random chance • BLUE- Best Linear Unbiased Estimator! • Assumptions 1-5 • In a correctly specified model different distributions will yield (on average) same estimates • Implications for survey design and sampling
Weighting • The purpose of sampling weights is to: ”…make the distribution of some set of variables in the data approximate the distribution of those variables in the population from which the sample was drawn.” p.240
Expected vs. OLS vs. WOLS • When correctly specified model is estimated, the parameters will not differ across Expected, OLS and WOLS • Reasons why parameters may differ • Incorrectly Specified Models • Omitted Variable Bias • Nonlinear Relationships • Pooled samples • weighted average of the estimates for the two separate samples
Weighting and OLS • Incorrectly specified models’ parameters are sensitive to relative group size and weighting. • Parameter sensitivity can be used to diagnose model misspecification • Interaction between Weight and X should explain no additional variance • May be a proxy for interactions or omitted variables • Weights are a function of the data!
Estimation Issues in WOLS • Standard errors • Sample Size • Assumes there are Wi individuals in the sample • Homoscedasticity • Errors are multiplied by a constant Wi • Bias can’t be predicted • White (1980) hetereoscedastic consistent estimator
Problem 1: Weights are a function of observed independent variables included in the model • Estimate Weighted Model • DuMouchel and Duncan (1983) • Look for differences between the OLS and WOLS parameter estimates • Goal is to find possible sources of misspecification • No difference, use OLS! • Difference, respecify model
Problem 2: Weights are a function of observed independent variables included in the model and the dependent variable • Estimate Weighted Model • DuMouchel and Duncan (1983) • Look for differences between the OLS and WOLS parameter estimates • Likely to be different in this case • Difference, respecify model and use OLS • If parameters still differ, use WOLS • Sample selection bias
Model Specification as a Golden Ticket • Model specification • Social Phenomena is by nature complex! • “causal model” e.g. correctly specified model, is often unknown • May not be known a priori • Imperfect data • “true causal model” is often complex in ways which violates OLS assumptions • Autocorrelated errors (Assumption 4 and 5) • Clustered data • Longitudinal