110 likes | 301 Views
SHARE IDs. Stephanie Stuck MEA Frankfurt December 6 th. Data versions and ID-variables. sampid rules (old). Digits 1-2: country code for example 11 for Austria, 23 for Belgium French speaking
E N D
SHARE IDs Stephanie Stuck MEA Frankfurt December 6th
sampid rules (old) • Digits 1-2: country codefor example 11 for Austria, 23 for Belgium French speaking • Digits 3-5: wave indicatorindicates the wave in which the household participated for the firsttime (042 for wave 1 main survey and 062 for wave 2 main survey). • Digits 6-11: household ID • Digits 12-13: longitudinal household split indicator00 by default, if household splits, respondent moves out of the household based on respid, e.g. if ‘moving out respondent’ has respid 01 it is changed to 01 Examples1104200010000: Austria, starting in wave 1 (longitudinal sample)1104200010001: Austria, starting in wave 1, split off household in wave 22306214010300: Belgium (French speaking), starting wave 2 (refresher) • One needs to combinesampid with the respondent ID (respid) to identify and merge cases on the respondent level • Merging problems esp. for split households / ‘moving’ respondents across waves
Therefore... • We will change the system and • have unique person ids, that can be used to merge modules and waves • person id will not change across waves, even if a household splits • have string country codes instead of numeric ones • We will divide sampid into different parts: • household id (fixed part and split indicator if needed) • new wave indictor variable ‘wi’indicates when a household first entered the sample
Old and new country codes(first two digits of household ids and person ids)
Old and new country codes(first two digits of household ids and person ids)
New household identifier hhidcom (internal) & hhid (public) • Digits 1-2: country code in letters. e.g. AT for Austria, Bf for Belgium French speaking (internal) • Digits 3-8: fixed household ID This part will not change across waves if household splits off • Digit 9: one digit added to the fixed household id to identify whether it is an ‘additional’ household that resulted from a split, • A for all ‘original’ household (all in wave 1, refresher in wave 2) • B used only if a household has split. A is than still used for the ‘first’ part of the household and B for the ‘splitting part’ (the one that is interviewed second, normally the one that moved out) • C is used for very rare case of split off household when original household in wave 1 consisted of 3 eligible sisters for example and split in 3 parts. Examples for new household idAT100100A: Austria, ‘original’ householdAT100100B: Austria, split off householdBf140103A: Belgium French speaking household (internal)
New person identifier: pidcom • Digits 1-2: country code (CC) in letterse.g. AT for Austria, Bf for Belgium French speaking • Digits 3-8: fixed household ID this part will not change across waves. • Digit 9-10: respondent id, e.g if respid is 1 it will be 01
In addition: • A dataset will be generated that shows to which households a respondent belonged during her or his ‘SHARE history’, e.g.: • A compatibility file will be made for internal use to merge the old sampid respid files with the new ids • We will have an additional person id (uuid) to insure uniqueness, but it will be used in the background only for technical reasons
Right now • we still have to use the old system for data cleaning • but we will have soon have the pidcom to merge across waves • mergeid will already be included in release 0 • as soon as the new system is available and checked we will inform you how to go onprobably in the next SHARE data cleaning meeting (February 6, Antwerp)