110 likes | 238 Views
SHARE Data Cleaning General rules and procedures. Stephanie Stuck MEA Antwerp February 6 th /7 th 2008. General philosophy. Respondents are experts of their own lives, in general we (still ) take their answers very seriously
E N D
SHARE Data CleaningGeneral rules and procedures Stephanie Stuck MEA Antwerp February 6th/7th 2008
General philosophy • Respondents are experts of their own lives, in general we (still ) take their answers very seriously • Only change data if you are sure it is wrong, if answers seem implausible but you are not sure what to do indicate this via flag variable
General rules • Please use data files with sampid for data cleaning (don’t use data version with sampid2) • Always write programs to correct data (STATA do or SPSS sps files) please never change data directly (e.g. no changes in editors)
Program files (do or sps) should always start with: • Name of author & date of program • Data version (date) and modules • Short description of program • Sequence of programs
in programs always • Keep original variables (“varname_original”) STATA: • generate dn003_original = dn003_ SPSS: • compute dn003_original = dn003_ • do not change variables called “varname_original” • but change variables with “varname” STATA: • replace dn003_ = 1919 if sampid == “1206211111100” & respid == 1 SPSS: • if (sampid == “1206211111100” & respid == 1) dn003_ = 1919
in programs always • Add flag variables to indicate changes (“varname_flag”) STATA • generate dn003_flag = 0 • replace dn003_flag = 1 if dn003_original ~= dn003 SPSS • compute dn003_flag = 0 • if (dn003_original ~= dn003) dn003_flag = 1 • Please label flag variables • “0” should always be used for “no changes/ok” • Other values can be used as needed e.g.: “1: year of birth changed”“2: implausible”
Always • Save corrected data files with new name • save “filename_corrected_1”) • save “filename_corrected_2”)
General procedures • Country teams send program files to MEA • MEA runs files and creates new data versions • MEA uploads files to web site on new internal SHARE site • New data versions will be named with numbers in the end: share_w2_`module’_1 • Country teams download files and can go on checking and cleaning data
Wave 1 data • Please don’t take wave 1 information for granted, it can be wrong, too • sometimes we will have to change wave 1 data, too • CentERdata and MEA currently prepares a version of wave 1 data that includes • Respid for all eligibles (right now respid is only included for respondents) • Flags for changes during cleaning wave 1 data • we will have another release of wave 1 data together with the public release of wave 2
What I learned • You need more ‘step by step’ guidelines, clear instructions, • Where to start – priority list • What exactly to do – programs, examples • When to do it – schedule
Very next steps • Send the programs you have written to MEA • Send drop offs and vignette forms to MEA (paper versions), also check them for country specific deviations • Imputations group and MEA send around priority list and more instruction • MEA and CentERdata prepare updated wave 1 and wave 2 files incl. sampid, respid for all eligibles & a new merging variable