1 / 33

Non-response and what to do about it

Non-response and what to do about it . Gillian Raab Professor of Applied Statistics Napier University. What do we mean by non-response. Unit non response Item non response Start with the first of these It is a respondent to a survey who we tried to get but did not obtain any response from

Mercy
Download Presentation

Non-response and what to do about it

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University Scot Exec Course Nov/Dec 04

  2. What do we mean by non-response • Unit non response • Item non response • Start with the first of these • It is a respondent to a survey who we tried to get but did not obtain any response from • We may or may not know anything about them or whether they exist Scot Exec Course Nov/Dec 04

  3. What is an acceptable response rate? • 99% • 90% • 80% • 70% • 50% • 40% • 30% • 20% It depends who you are. It depends on why the response is poor It depends on whether non-responders are like responders Scot Exec Course Nov/Dec 04

  4. An example • Postal survey on attitudes to racial discrimination got a 45% response rate • Half of the letters were lost by the post-office, but most of the others replied • No letters were lost, but a qualitative study after the survey revealed that many people in the study did not reply because they were hostile to immigrant groups Scot Exec Course Nov/Dec 04

  5. Types of ‘missingness’ • In the first example missing people might not be thought to be different from others • Missing Completely at Random (MCAR) • In the second one the missing people would be likely to have quite different views • Missing Not at Random (MNAR) Scot Exec Course Nov/Dec 04

  6. An intermediate position • Missing At random (MAR) • Assumes that within groups we can identify in the survey, the missing people are just like the ones who reply • The methods that survey researchers use all make this assumption • But you need good information about those who don’t respond Scot Exec Course Nov/Dec 04

  7. Survey non response is a world-wide problem – here the US – refusal rates in major US surveys Scot Exec Course Nov/Dec 04

  8. Acrostic et al. J of Official Statistics – non contact rates Scot Exec Course Nov/Dec 04

  9. So doing something about it has become important • The most commonly used method for unit non-response is weighting • Non response weights can be calculated • From data available on the sampling frame • From another source of data for the population If it is the latter it is often called POST-STRATIFICATION Scot Exec Course Nov/Dec 04

  10. The case against weighting • 10 years ago non-response weighting not the norm Reasons: • Response rates pretty high on govt sponsored surveys • Beyond age and sex, not many control totals around Scot Exec Course Nov/Dec 04

  11. The case against • Many surveys had long histories: weighting would introduce discontinuities • Non-response is subjective and cosmetic! • No two statisticians would create the same set of non-response weights. Unscientific. • Weighting makes analysis more complex and error prone. Scot Exec Course Nov/Dec 04

  12. The counter-arguments • Non-response rates have now grown (c. 1%/year) • The under-representation of certain groups is a constant and clearly biasing • Trends in response rates undermine the argument against introducing discontinuities • ONS have concluded that all national statistics should be calculated on a consistent basis (same age-sex-region distn) Scot Exec Course Nov/Dec 04

  13. The implications • Most surveys now come with non-response weights • New industry of calculating weights for old surveys • Not too comfortable a position (what if new method around the corner?) Scot Exec Course Nov/Dec 04

  14. The view from the survey analyst • Approach kept as simple as possible. Adjust (standardise) for age-sex; no major attempt to eliminate other biases. • Usual approach = calibration weighting where (a) adjust to national age-sex totals; but (b) give all household members same weight. Scot Exec Course Nov/Dec 04

  15. My view • It is not safe to assume MAR based only on age and sex weighting • People from deprived areas tend to give lower response rates • Apparent trends may be trends in response rates • Much better area level data is now available • But this risks interrupting time series Scot Exec Course Nov/Dec 04

  16. Another quote from a survey methodologists • Non-response weights are subjective. You don’t have to trust them. • Check that observed differences are not attributable to weighting • If you think the survey organisation has missed a trick then tell them! Scot Exec Course Nov/Dec 04

  17. An example – Ayr and Arran Health Survey • Postal survey based on CHI • Response rate about 50% • Can’t be sure of response rate because ‘dead wood’ not properly accounted for • Population data available for data zones by 5 year age and sex groups Scot Exec Course Nov/Dec 04

  18. How to do it – simple case • Age/sex groups only • Make a table by age group and sex for the Census data and the survey • Reasonable size groups (>50) • Calculate ratio of sample numbers to population (overall 1.5% or 0.015) • Inverse of this becomes the grossing up weight Scot Exec Course Nov/Dec 04

  19. How to do it – more complicated • Want to adjust for data zone characteristics • SIMD04 has 6 dimensions • Income health education employment housing and access • As soon as we break these down into groups the numbers will get too small to give stable results • So we use a modelling process Scot Exec Course Nov/Dec 04

  20. Use a logistic regression model Logit( response rate) = b0 + b1X1 + b2X2…..+ b1X1, Data are responses / population Xs are age, sex and data zone characteristics and their interactions Scot Exec Course Nov/Dec 04

  21. Model developed included • Age groups, sex and their interactions, income score and its interaction with age, access deprivation • Fitted values for this model were plotted Scot Exec Course Nov/Dec 04

  22. Scot Exec Course Nov/Dec 04

  23. Scot Exec Course Nov/Dec 04

  24. Getting the weights • These become the inverse of the probability of selection for each respondent • Usually rescale to get weights with a mean of 1.0 • Checking for weights that are too big is a good idea • Some people cap weights at a maximum value (10 or even 2.5) • I prefer to make sure that the weighting model does not give such extreme weights Scot Exec Course Nov/Dec 04

  25. Why such extreme weights here? • The CHI is not a perfect sampling frame • It has dead people on it and people who have moved away • We think that non-contacts were replaced • We did have some data on all addresses used Scot Exec Course Nov/Dec 04

  26. Scot Exec Course Nov/Dec 04

  27. Item non-response • Ignore cases with missing data • Becomes problematic in regression models • Use imputation to replace the missing values • Informed inputation • Hot deck imputation • Model based imputation (can be multiple) Scot Exec Course Nov/Dec 04

  28. Informed imputation • Mainly used for sub-items when a total is needed • Eg income, housing costs • Often requires detailed examination of cases • E.g. finding benefit entitlement • Costs of a particular repair Survey specific Scot Exec Course Nov/Dec 04

  29. Hot deck imputation • Often used in census data • Can be used for both unit and item non-response • For unit non response a missing case is replaced with another one that matches on whatever data are available • For item non response another case is selected that may be similar to the case with the missing item on other things that are measured. • Can get very messy and difficult and lead to things like pregnant men Scot Exec Course Nov/Dec 04

  30. Model based imputation • Assumes some statistical model for the data • For example – a multivariate normal distribution • Start by relacing missing values by their means • Fits the model and then replaces the missing values with a sample from their predictive distribution given the data • Do this repeatedly until the pattern stabilises • You then have a complete data set to work with Scot Exec Course Nov/Dec 04

  31. It works surprisingly well • Even when the data are categories • Just analysing the data as they are would give misleading precision • But there is an easy adjustment that can be made by running more than one imputation (usually 5) and adding in a bit for the variation between them. Scot Exec Course Nov/Dec 04

  32. It is accessible • Theory and practice has been developed by Don Rubin and Jo Schaffer • Implemented in several programmes • Including SAS PROC MI • Once you have the multiple data sets they can be analysed with PROC MIANALYSE Scot Exec Course Nov/Dec 04

  33. Summary • Unit non response • Weighting • Hot deck imputation • Item non –response • Use available cases • Use imputation • Only time for a sketch of the latter Scot Exec Course Nov/Dec 04

More Related