230 likes | 356 Views
Working with EU-SILC using the hierarchical data structure, matching & aggregating data Practical computing session I – Part 2 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften. DwB-Training Cource on EU-SILC , February 13-15, 2013
E N D
Working with EU-SILC using the hierarchical data structure, matching & aggregating dataPractical computing session I – Part 2Heike WirthGESIS – Leibniz Institut für Sozialwissenschaften DwB-Training Cource on EU-SILC , February 13-15, 2013 Romanian Social Data Archive at the Departement of Sociology University of Bucharest, Romania
Introduction • EU-SILC data has a hierarchical structure • more than one level of analysis is possible • household & individual levels are represented by separate files • data are stored in multiple data files
Working with this kind of data, requires • Decision on the appropriate unit of analysis for your research question, e.g. • research interest in households or persons? • % of households /persons/men/women/children who live in poverty? • % of households with only 1 person or % of persons who live alone? • Knowledge of procedures for manipulating the data
Types of Matching • One-to-one matching • Household Register to Household Data; • Personal Register to Personal Data • One-to-many matching • Household variables to Individual data • Many-to-one matching (‘aggregation’) • e.g. adding information from the individual data to the household data
EU-SILC – Types of matching Personal- Register File (R) Household- Register File (D) n:1 1:n n:1 1:n 1:1 1:1 n:1 1:n Household- Data File (H) Personal- Data File (P) n:1 1:n
Linking EU-SILC files (cross-sectional) • Key variables provide links between the related records • between household files • between individual files • between household and individual files • Key variables (depending on the files) are • household id (DB030; HB030; RX030; PX030) • personal id (RB030; PB030) • to be on the safe side: Use key variables always with • ‘year of survey’ (DB010; HB010; RB010; PB010) & • ‘country’ (DB020; HB020; RB020; PB020)
Example 1: one-to-one • Attach household register information (D-File) to household data file (H-File) • e.g. ‘Degree of urbanisation’ (DB100) is only included in the household register, it might be of use having this information in the household data, too.
Example 2: one-to-many • Attach household register information (D-File) to personal data file (P-File) • Attach ‘Degree of urbanisation’ (again) to the personal data file
Example 3: many-to-one • e.g. number of persons in a households who are • unemployed, • full-time employed • self-employed? • such information is not included in the data => own computation
Hands on – matching 1:1 • Attach ‘Degree of Urbanisation’ (DB100) to household data file (H-File) • Open the EU-SILC training dataset – D-File *. • Check the variables you are interested in . • Sort your data according to key variables used für linkage *. • Names of key variables in files to be matched must identical => Create new key variables (ID010, ID020, ID_HH) in such a way that DB010 = ID010 DB020 = ID020 DB030 = ID_HH • Create a new file with only the key variables & the variable(s) you are interested in • name the new file DB100.sav
SPSS–Matching: one-to-one • **** Before you start ************. * specify the path where the EU-SILC training dataset is stored. FILE HANDLE data_path / NAME='H:\wirth\DWB_TRAINING\SILC\DATA\'. * specify the path where you want to save your data. FILE HANDLE mydata_path /NAME='H:\wirth\DWB_TRAINING\SILC\EXERCISE_1\'. open the EU-SILC training dataset – D-File *. GET FILE='data_path/udb_c10d_silc_course.sav'. * check the variables you are interested in . cross DB020 by DB100.
SPSS–Matching: one-to-one * open the EU-SILC training dataset – D-File *. GET FILE='data_path/udb_c10d_silc_course.sav'. * check the variables you are interested in . cross DB020 by DB100. * Step 1- Sort your data according to key variables used für linkage *. sort cases by DB010 DB020 DB030. * Step 2 - Names of key variables in files to be matched must identical *. rename variables (DB010 DB020 DB030 = ID010 ID020 ID_HH). * create a new file with the key variables & the variable(s) you are interested in *. save outfile = 'mydata_path/DB100.sav' /keep ID010 ID020 ID_HH DB100.
SPSS–Matching: one-to-one GET FILE='data_path/udb_c10H_silc_course.sav'. sort cases HB010 HB020 HB030. * Key – Variables *. * either rename (like before) or better generate a new variable * STRING ID020 (A2). compute ID010 = HB010. compute ID020 = HB020. compute ID_HH = HB030. MATCH FILES FILE= * /file ='mydata_path/DB100.sav' /BY ID010 ID020 ID_HH. execute. * check whether it worked. cross HB020 by DB100.
SPSS–Matching: One-to-many Match (1:n) Example 2: Combing household and personal data E.g. ‘Degree of Urbanisation’ (DB100) to personal data. GET FILE='data_path/udb_c10p_silc_course.sav'. * Sort key variables used für linkage *. sort cases by PB010 PB020 PX030. * PB020 = string variable - create a new string variable ID020 /or use the rename command * STRING ID020 (A2). compute ID010 = PB010. compute ID020 = PB020. compute ID_HH = PX030.
SPSS–Matching: One-to-many Match (1:n) MATCH FILES FILE= * /table = 'mydata_path/DB100.sav' /BY ID010 ID020 ID_HH. execute. * Check whether it worked *. cross pb020 by db100. save outfile = 'mydata_path/personal_data.sav'.
Matching: many-to-one (n : 1) • Create new summary variables for personal data (P-File) • number of persons living in the same household • number of unemployed persons living in a household • number of full-time employed persons living in a household • number of part-time employed persons living in a household • number of self-employed persons living in a household • sum of ‘pensions from individual private plans (PY080G)
*********************************************************. * many-to-one (n:1) * Personal Data * example 1 * number of persons living in the same household * number of unemployed persons living in a household *********************************************************. * specify the path where the EU-SILC training dataset is stored. FILE HANDLE data_path / NAME='H:\wirth\DWB_TRAINING\SILC\DATA\'. * specify the path where you want to save your data. FILE HANDLE mydata_path / NAME='H:\wirth\DWB_TRAINING\SILC\EXERCISE_1\'. * open the EU-SILC training dataset. GET FILE='data_path/udb_c10p_silc_course.sav'.