180 likes | 376 Views
Post-collection processing of data (continued). Survey Research and Design Spring 2006 Class #13 (Week 15). Today’s objectives. To answer questions you have To understand the design effect and how to handle it To understand the concept of weighting and learn how to calculate weights
E N D
Post-collection processing of data (continued) Survey Research and Design Spring 2006 Class #13 (Week 15)
Today’s objectives • To answer questions you have • To understand the design effect and how to handle it • To understand the concept of weighting and learn how to calculate weights • To begin group presentations Survey Research and Design (Umbach)
Post-collection processing of survey data • Several different steps • Coding • Data entry • Editing • Handling item-missing data • Weighting • Sampling variance estimation • Weighting is a very common post-collection process Survey Research and Design (Umbach)
Probability Sampling • Simple random • Stratified random • Proportional • Nonproportional • Systematic • Cluster Survey Research and Design (Umbach)
Accounting for complex sample design when conducting analyses • Standard errors or design effect resulting from cluster • Weighting What are the implications for analysis and/or external validity? Survey Research and Design (Umbach)
Design effect • Term used to describe effect complex sample design • Measure of departure of the complex design from a simple random sample • Causes a misestimation (usually underestimates) of standard errors • Magnitude of cluster effect can be assessed using ICC Survey Research and Design (Umbach)
Corrective strategies for DEFF • Use of software packages (e.g., AM, SUDAAN, WesVar, SPSS, SAS, STATA) • Most precise • Specify strata and cluster (primary sampling unit-PSU) • Adjust estimated standard errors by known DEFF • Alter alpha criteria Survey Research and Design (Umbach)
Weighting • Why weight? • For a variety of reasons, distributions in our sample may differ markedly from the population; e.g., more females than males • If females differ from males on our survey statistic, estimates will be biased • Hopefully weighting will help to reduce this bias • However, weighting also increases variances, so weighted estimates will be less precise • For your purposes, two weights to consider • Selection weight – takes into account oversampling of subgroups • Nonresponse weight – takes into account differential nonresponse across subgroups Survey Research and Design (Umbach)
Weighting • In order to weight, we need some information on all members of the sample • What information? • For selection weights, use whatever criteria were used for the oversampling • For nonresponse weights, we want variables that are good predictors of nonresponse • If you have a lot of variables, you can use data mining programs to find key predictor variables • Otherwise, rely on previous survey research studies: • Females, older, Whites, high GPA, and high SES are more likely to respond • Think about possible predictors when requesting your sampling frame Survey Research and Design (Umbach)
Selection weights • Suppose we design a survey project so that our sample will be 1,000 IU undergraduates students. • The race/ethnicity breakdown for IU undergrads (N=20,732) is • African American – 2.9% • American Indian/Alaskan Native – 0.3% • Asian/Pacific Islander – 3.3% • Hispanic – 2.3% • White – 88.1% • International – 3.1% • With 1,000 students, we would expect to have only 31 int’l students • But if we want to also analyze int’l students as a subgroup, we would need a larger sample. • So we could oversample int’l students, so that we end up with 400 in our sample. Survey Research and Design (Umbach)
Selection weights • So now our sample is • 600 U.S. students and 400 int’l students • Int’l students are now 40% of our sample instead of 3.1%. • Weights are estimated as the reciprocal of the selection probability ps (sample size/population size): Survey Research and Design (Umbach)
Selection weights • How do these weights make a difference? Suppose mean satisfaction for int’l students is 3.5, but for U.S. students it is only 2.5 • With a sample of 969 U.S. students and 31 int’l students, mean satisfaction for the sample would be • (969*2.5 + 31*3.5)/(969 + 31) = 2.53 • With a sample of 600 U.S. students and 400 int’l students, mean satisfaction for the sample would be • (600*2.5 + 400*3.5)/(600 + 400) = 2.90 • Let’s recalculate the second mean using the selection weights: • (600*2.5*33.48 + 400*3.5*1.61)/(600*33.48 + 400*1.61) = 2.53 • In essence, the 600 U.S. respondents “count for more” when we use the selection weights Survey Research and Design (Umbach)
Nonresponse weights • Several different ways to calculate nonresponse weights (see Kalton article) • Most common is cell weighting • Sample is divided into subgroups based on data external to the survey • Weights are calculated based on the probability of response • Suppose we administer a SRS survey to 1,600 students with an overall response rate of 50%. • Because we have data on gender for the entire sample, we can calculate response rates for males and females: • Males: 800 in sample, 344 responded, response rate = 43% • Females: 800 in sample, 456 responded, response rate = 57% Survey Research and Design (Umbach)
Nonresponse weights • Nonresponse weights are the reciprocal of the probability of response: • Males: 1/.43 = 2.326 • Females: 1/.57 = 1.754 • Remember to double-check your weights by multiplying them by the cell size • Males: 2.326*344 = 800.14 • Females: 1.754*456 = 799.82 • If you have both selection weights and nonresponse weights, multiply them together to get one weight. Survey Research and Design (Umbach)
A warning about weights • You should have noticed that weighting increases your sample size • In the previous example we had a final sample n of 800; using the nonresponse weights for males and females increases this sample size to 1,600 • Some software programs do not take this into account, and use n=1,600 for statistical tests instead of n=800. • Your weighted number of cases should equal your unweighted number of cases • To correct, you need to normalize your weights • Find the mean weight in your sample • Divide all weights by this mean • Weights should now sum to your unweighted n • These are also called relative weights Survey Research and Design (Umbach)
A warning about weights • In the nonresponse example, the mean of the weights is 2.0 • So the normalized weights are • Males: 2.326/2.0 = 1.163 • Females: 1.754/2.0 = 0.877 • Now if we apply the weights to our cell n’s • Males: 1.163*344 = 400.1 • Females: 0.877*456 = 399.9 • These sum to 800, our unweighted sample size • The mean of these new weights should equal 1 (way to double-check) • You should check each procedure to see if you should normalize!! • With SPSS, you generally need to normalize • With SAS, it depends; also, some procedures will normalize for you • See Heck and Thomas article Survey Research and Design (Umbach)
For next class… • Group projects due • Remaining group presentations • Course evaluations Survey Research and Design (Umbach)