210 likes | 434 Views
Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften. DwB-Training Cource on EU-SILC , February 13-15, 2013 Romanian Social Data Archive at the Departement of Sociology
E N D
Working with EU-SILC: data files, variables anddata managementPractical computing session I – Part 1Heike WirthGESIS – Leibniz Institut für Sozialwissenschaften DwB-Training Cource on EU-SILC , February 13-15, 2013 Romanian Social Data Archive at the Departement of Sociology University of Bucharest, Romania
Overview • EU-SILC datasets • EU-SILC Variables • Differences between Data collected & anonymised User Database (UDB) • Hands on • Transform CSV-File into SPSS/Stata-Systemfile • number of households/persons in the file
EU-SILC Data • Four separate files • Household ( = 1 observation per household) • Register data (D) • Household data (H) • Individuals (= 1 observation per person) • Register data (R) • Personal data (P) • Since cross & longitudinal data are provided separately => 8 files
EU-SILC Data For example: • UDB_c10D_ver 2010-1 from 01-03-12.csv • UDB_c10H_ver 2010-1 from 01-03-12.csv • UDB_c10R_ver 2010-1 from 01-03-12.csv • UDB_c10P_ver 2010-1 from 01-03-12.csv • _c = cross; _l = longitudinal • 10 = year of the survey = 2010 • D = Household Register File • H = Household Data File • R = Personal Register File • P = Personal Data File • 2010-1= version of the data (e.g. 1st version of the 2010 data) • csv = type of data (=comma separated values)
EU-SILC Data • Household Register File (D) • one record for every household including information regarding sample units, household weights, etc • e.g. UDB_c10D_ver 2010-2: N = 225 972 households • Household Data File (H) • one record for every household including household data • e.g. UDB_c10H_ver 2010-2: N = 225 972 households • Personal Register File (R) • one record for every person currently living in the household or temporarily absent • e.g. UDB_c10R_ver 2010-2: N = 576 531 persons • Personal Data File (P) • Reference population: members of the household aged 16 and over • e.g. UDB_c10R_ver 2010-2: N = 476 705 persons
Domains & Areas - Households Source: Guidelines_Doc65_2010.pdf, p.73
Domains & Areas - Persons Source: Guidelines_Doc65_2010.pdf, p.73
EU-SILC Variables • Variable names in EU-SILC are composed of 3 parts: • 1st character refers to the dataset (D; H; R; P) • 2nd character refers to the domain • 3 digits represent a sequential number • e.g. PE040 = Highest ISCED Level attained • Most important piece of data documentation: • Guideline ‘Description of Target Variables’ • refers to variables delivered by the NSIs to EUROSTAT
Additional important information • DIFFERENCES BETWEEN DATA COLLECTED (as described in the guidelines) AND THE ANONYMISED USER DATABASE • All income variables are in € (EURO) • Variables removed • Top/Bottom coding • Variables added • in addition: country specific rules
Anonymised User Database – Variables added • Names of variable added • 1st character refers to the file (D; H; R; P) • 2nd character ‘X’ • 3 digits represent a sequential number • e.g. • HX040: Household size • HX060: Household type • HX080: Poverty Indicator • (….)
Hands on – Exercise 1 • Step 1: Open the 4 SPSS and/or Stata – Systemfiles • Step 2: - Check the data • How many households are included in the data (H- & D-File) • total • by country • How many persons are included in the data (P- & R-File) • total (any differences between the P- & R-File?) • by country • There are 15 countries in the training files. Fill in the table (next slide) • What are the main differences across countries? • Are there differences in the % of unemployed depending whether you use RB210 or PL031, why?