340 likes | 647 Views
Non-response and what to do about it . Gillian Raab Professor of Applied Statistics Napier University. What do we mean by non-response. Unit non response Item non response Start with the first of these It is a respondent to a survey who we tried to get but did not obtain any response from
E N D
Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University Scot Exec Course Nov/Dec 04
What do we mean by non-response • Unit non response • Item non response • Start with the first of these • It is a respondent to a survey who we tried to get but did not obtain any response from • We may or may not know anything about them or whether they exist Scot Exec Course Nov/Dec 04
What is an acceptable response rate? • 99% • 90% • 80% • 70% • 50% • 40% • 30% • 20% It depends who you are. It depends on why the response is poor It depends on whether non-responders are like responders Scot Exec Course Nov/Dec 04
An example • Postal survey on attitudes to racial discrimination got a 45% response rate • Half of the letters were lost by the post-office, but most of the others replied • No letters were lost, but a qualitative study after the survey revealed that many people in the study did not reply because they were hostile to immigrant groups Scot Exec Course Nov/Dec 04
Types of ‘missingness’ • In the first example missing people might not be thought to be different from others • Missing Completely at Random (MCAR) • In the second one the missing people would be likely to have quite different views • Missing Not at Random (MNAR) Scot Exec Course Nov/Dec 04
An intermediate position • Missing At random (MAR) • Assumes that within groups we can identify in the survey, the missing people are just like the ones who reply • The methods that survey researchers use all make this assumption • But you need good information about those who don’t respond Scot Exec Course Nov/Dec 04
Survey non response is a world-wide problem – here the US – refusal rates in major US surveys Scot Exec Course Nov/Dec 04
Acrostic et al. J of Official Statistics – non contact rates Scot Exec Course Nov/Dec 04
So doing something about it has become important • The most commonly used method for unit non-response is weighting • Non response weights can be calculated • From data available on the sampling frame • From another source of data for the population If it is the latter it is often called POST-STRATIFICATION Scot Exec Course Nov/Dec 04
The case against weighting • 10 years ago non-response weighting not the norm Reasons: • Response rates pretty high on govt sponsored surveys • Beyond age and sex, not many control totals around Scot Exec Course Nov/Dec 04
The case against • Many surveys had long histories: weighting would introduce discontinuities • Non-response is subjective and cosmetic! • No two statisticians would create the same set of non-response weights. Unscientific. • Weighting makes analysis more complex and error prone. Scot Exec Course Nov/Dec 04
The counter-arguments • Non-response rates have now grown (c. 1%/year) • The under-representation of certain groups is a constant and clearly biasing • Trends in response rates undermine the argument against introducing discontinuities • ONS have concluded that all national statistics should be calculated on a consistent basis (same age-sex-region distn) Scot Exec Course Nov/Dec 04
The implications • Most surveys now come with non-response weights • New industry of calculating weights for old surveys • Not too comfortable a position (what if new method around the corner?) Scot Exec Course Nov/Dec 04
The view from the survey analyst • Approach kept as simple as possible. Adjust (standardise) for age-sex; no major attempt to eliminate other biases. • Usual approach = calibration weighting where (a) adjust to national age-sex totals; but (b) give all household members same weight. Scot Exec Course Nov/Dec 04
My view • It is not safe to assume MAR based only on age and sex weighting • People from deprived areas tend to give lower response rates • Apparent trends may be trends in response rates • Much better area level data is now available • But this risks interrupting time series Scot Exec Course Nov/Dec 04
Another quote from a survey methodologists • Non-response weights are subjective. You don’t have to trust them. • Check that observed differences are not attributable to weighting • If you think the survey organisation has missed a trick then tell them! Scot Exec Course Nov/Dec 04
An example – Ayr and Arran Health Survey • Postal survey based on CHI • Response rate about 50% • Can’t be sure of response rate because ‘dead wood’ not properly accounted for • Population data available for data zones by 5 year age and sex groups Scot Exec Course Nov/Dec 04
How to do it – simple case • Age/sex groups only • Make a table by age group and sex for the Census data and the survey • Reasonable size groups (>50) • Calculate ratio of sample numbers to population (overall 1.5% or 0.015) • Inverse of this becomes the grossing up weight Scot Exec Course Nov/Dec 04
How to do it – more complicated • Want to adjust for data zone characteristics • SIMD04 has 6 dimensions • Income health education employment housing and access • As soon as we break these down into groups the numbers will get too small to give stable results • So we use a modelling process Scot Exec Course Nov/Dec 04
Use a logistic regression model Logit( response rate) = b0 + b1X1 + b2X2…..+ b1X1, Data are responses / population Xs are age, sex and data zone characteristics and their interactions Scot Exec Course Nov/Dec 04
Model developed included • Age groups, sex and their interactions, income score and its interaction with age, access deprivation • Fitted values for this model were plotted Scot Exec Course Nov/Dec 04
Getting the weights • These become the inverse of the probability of selection for each respondent • Usually rescale to get weights with a mean of 1.0 • Checking for weights that are too big is a good idea • Some people cap weights at a maximum value (10 or even 2.5) • I prefer to make sure that the weighting model does not give such extreme weights Scot Exec Course Nov/Dec 04
Why such extreme weights here? • The CHI is not a perfect sampling frame • It has dead people on it and people who have moved away • We think that non-contacts were replaced • We did have some data on all addresses used Scot Exec Course Nov/Dec 04
Item non-response • Ignore cases with missing data • Becomes problematic in regression models • Use imputation to replace the missing values • Informed inputation • Hot deck imputation • Model based imputation (can be multiple) Scot Exec Course Nov/Dec 04
Informed imputation • Mainly used for sub-items when a total is needed • Eg income, housing costs • Often requires detailed examination of cases • E.g. finding benefit entitlement • Costs of a particular repair Survey specific Scot Exec Course Nov/Dec 04
Hot deck imputation • Often used in census data • Can be used for both unit and item non-response • For unit non response a missing case is replaced with another one that matches on whatever data are available • For item non response another case is selected that may be similar to the case with the missing item on other things that are measured. • Can get very messy and difficult and lead to things like pregnant men Scot Exec Course Nov/Dec 04
Model based imputation • Assumes some statistical model for the data • For example – a multivariate normal distribution • Start by relacing missing values by their means • Fits the model and then replaces the missing values with a sample from their predictive distribution given the data • Do this repeatedly until the pattern stabilises • You then have a complete data set to work with Scot Exec Course Nov/Dec 04
It works surprisingly well • Even when the data are categories • Just analysing the data as they are would give misleading precision • But there is an easy adjustment that can be made by running more than one imputation (usually 5) and adding in a bit for the variation between them. Scot Exec Course Nov/Dec 04
It is accessible • Theory and practice has been developed by Don Rubin and Jo Schaffer • Implemented in several programmes • Including SAS PROC MI • Once you have the multiple data sets they can be analysed with PROC MIANALYSE Scot Exec Course Nov/Dec 04
Summary • Unit non response • Weighting • Hot deck imputation • Item non –response • Use available cases • Use imputation • Only time for a sketch of the latter Scot Exec Course Nov/Dec 04