1 / 25

Myoung Ho Lee

Myoung Ho Lee. STATISTICAL METHODS FOR REDUCING BIAS IN WEB SURVEYS. 13 rd September 2012. Introduction Web surveys Methodology - Propensity Score Adjustment - Calibration (Rim weighting) Case Study Discussion and Conclusion. Contents. Trends in Data Collection

ollie
Download Presentation

Myoung Ho Lee

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Myoung Ho Lee STATISTICAL METHODS FOR REDUCING BIAS IN WEB SURVEYS 13rdSeptember2012

  2. Introduction • Web surveys • Methodology - Propensity Score Adjustment - Calibration (Rim weighting) • Case Study • Discussion and Conclusion Contents

  3. Trends in Data Collection Paper and Pencil => Telephone => Computer => Internet (Web) • Internet penetration Introduction

  4. Pros and Cons of Web surveys • Pros - Low cost and Speed - No interviewer effect - Visual, flexible and interactive - Respondents convenience • Cons - Quality of sample estimates • Web surveys may be solutions! But, Problems!!! Introduction

  5. Previous Studies • Harris Interactive (2000 ~ ) • Lee (2004), Lee and Valliant (2009) • Hur and Cho (2009) • Bethlehem (2010), etc. • Lee and Valliant (2009) : good performance in simulation • But, most other results do not seem to be so good. - Malhotra and Krosnick (2007), Huh and Cho (2009) Introduction

  6. Volunteer Panel Web Survey Protocol (Lee, 2004) Under-coverage Self-selection Non-response • Challenge: Fix anticipated biases in web survey estimates that result from under-coverage, self-selection and non-response Web surveys

  7. Proposed Adjustment Procedure for Volunteer Panel Web surveys (Lee, 2004) Methodology

  8. Propensity Score Adjustment (PSA) • Original idea : Comparison of two groups, treatment and control, in observational studies (Rosenbaum and Rubin, 1983) - by weighting using all auxiliary variables that are thought to account for the differences • In context of web surveys, this technique aims to correct for differences between offline people and online people - by certain inclinations of people who participate in the volunteer panel web survey Methodology

  9. “Webographic” : overlapping variables between web and reference survey - To capture the difference between online and offline populations (Schonlau et al., 2007) - For example, “Do you feel alone?”, “In the last month have you read a book?”…… (Harris Interactive) Methodology

  10. Propensity score : It is assumed that ziare independent given a set of covariates (xi) • ‘Strong ignorability assumption’ : Response variable is conditionally independent of treatment assignment given the propensity score. Methodology

  11. Logistic regression model : • Variable Selection • Include variables related to not only treatment assignment but also response in order to satisfy the ‘strong ignorability assumption’ (Rosenbaum and Rubin, 1984; Brookhart et al., 2006) Methodology

  12. Variable Selection • In practice, stepwise selection method has been often used to develop good predictive models for treatment assignment • Most previous web studies : Use of all available covariates (5-30) • Huh and Cho (2009) : 9 or 7 out of 123 covariates were chosen by their “subjective” views Methodology

  13. Variable Selection • Stepwise logistic regression using SIC - large number of covariates, little theoretical guidance • LASSO (PROC GLMSELECT in SAS) - a good alternative to stepwise variable selection • Boosted tree (“gbm” in R) - determine a set of split conditions Methodology

  14. Applying methods for PSA • Inverse propensity scores as weights - weights : - then, multiply them with sampling weights • Subclassification (Stratification) - subgrouping homogenous people into each stratum Methodology

  15. Subclassification (Stratification) • Combine both reference and web data into one • Estimate each propensity score from the combined sample • Partition those units into C subclasses according to ordered values, where each subclass has about the same number of units • Compute adjustment factor, and apply to all units in the cth subclass. • Multiply the factor with sampling weights to get PSA weights Methodology

  16. Calibration (Rim weighting) • Matching sample and population characteristics only with respect to the marginal distributions of selected covariates • Little and Wu (1991) - Iterative algorithm to alternatively adjust weights according to each covariates’ marginal distribution until convergence Methodology

  17. Case Study • Reference survey : “2009 Social Survey” by Statistics Korea - Culture & Leisure, Income & Consumption, etc. - All persons aged +15 in 17,000 households - Sample size : 37,049 - Face-to-face mode - Post-stratification estimation - Assumed to be “True” Case Study

  18. Web survey • Recruiting volunteers from web sites (6,854 households) • Systematic sampling with non-equal selection probabilities (inverse of rim weights using region, age, gender) • Sample size : 1,500 households and 2,903 respondents • Overlapping covariates : 123 Case Study

  19. M1 = Stepwise(22), M2 = Stepwise(17), M3 = LASSO(12), M4 = Boosted tree(18) Case Study – Model Selecion

  20. Assessment methods • 16 combinations : (Model 1, 2, 3 and 4) × (Inverse weighting and Subclassification) × (No Calibration and Rim weighting) • 12 response variables • Percentage of bias reduction Case Study

  21. Percentage of bias reduction PSA alone Calibration Inverse weighting Subclassification Inverse weighting Subclassification M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4 M1 M2 M3 M4

  22. Why PSA doesn’t work well alone ??? Propensity scores for each survey in 5 strata in Model 1 Discussion

  23. What are the possible solutions to fix poor PSA? • Setting maximum value of weight • Different subclassification algorithm - Formula for the variance of weights that depends on both the number of cases from each group within a stratum and the variability of propensity scores with the stratum • Matching PSA - limited number of treated group members and a larger number of control group members Discussion

  24. Violation of some assumptions - ‘Strong ignorability assumption’ - Missing at random (MAR) - Mode effects • Variable selection (What are webographic variables?) - Models affect the performance of PSA significantly - Maybe expert knowledge, not statistical approach - Further studies are needed Discussion

  25. Web surveys have attractive advantages • However, bias from self-selection, under-coverage, non-responses • According to my case study results, => It seems to be difficult to apply PSA to “real world” just now • Further researches on webographic variables and different PSA methods are needed Conclusion

More Related