1 / 23

DwB-Training Cource on EU-SILC , February 13-15, 2013

Working with EU-SILC using the hierarchical data structure, matching & aggregating data Practical computing session I – Part 2 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften. DwB-Training Cource on EU-SILC , February 13-15, 2013

mahina
Download Presentation

DwB-Training Cource on EU-SILC , February 13-15, 2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Working with EU-SILC using the hierarchical data structure, matching & aggregating dataPractical computing session I – Part 2Heike WirthGESIS – Leibniz Institut für Sozialwissenschaften DwB-Training Cource on EU-SILC , February 13-15, 2013 Romanian Social Data Archive at the Departement of Sociology University of Bucharest, Romania

  2. Introduction • EU-SILC data has a hierarchical structure • more than one level of analysis is possible • household & individual levels are represented by separate files • data are stored in multiple data files

  3. Example of household level data

  4. Example of individual level data

  5. Working with this kind of data, requires • Decision on the appropriate unit of analysis for your research question, e.g. • research interest in households or persons? • % of households /persons/men/women/children who live in poverty? • % of households with only 1 person or % of persons who live alone? • Knowledge of procedures for manipulating the data

  6. Types of Matching • One-to-one matching • Household Register to Household Data; • Personal Register to Personal Data • One-to-many matching • Household variables to Individual data • Many-to-one matching (‘aggregation’) • e.g. adding information from the individual data to the household data

  7. EU-SILC – Types of matching Personal- Register File (R) Household- Register File (D) n:1 1:n n:1 1:n 1:1 1:1 n:1 1:n Household- Data File (H) Personal- Data File (P) n:1 1:n

  8. Linking EU-SILC files (cross-sectional) • Key variables provide links between the related records • between household files • between individual files • between household and individual files • Key variables (depending on the files) are • household id (DB030; HB030; RX030; PX030) • personal id (RB030; PB030) • to be on the safe side: Use key variables always with • ‘year of survey’ (DB010; HB010; RB010; PB010) & • ‘country’ (DB020; HB020; RB020; PB020)

  9. Example 1: one-to-one • Attach household register information (D-File) to household data file (H-File) • e.g. ‘Degree of urbanisation’ (DB100) is only included in the household register, it might be of use having this information in the household data, too.

  10. One-to-One Match, e.g. household information

  11. Result: Combined Household File

  12. Example 2: one-to-many • Attach household register information (D-File) to personal data file (P-File) • Attach ‘Degree of urbanisation’ (again) to the personal data file

  13. Attaching household data to personal data (1:n)

  14. Example 3: many-to-one • e.g. number of persons in a households who are • unemployed, • full-time employed • self-employed? • such information is not included in the data => own computation

  15. Matching: many-to-one (summarizing information)

  16. Hands on – matching 1:1 • Attach ‘Degree of Urbanisation’ (DB100) to household data file (H-File) • Open the EU-SILC training dataset – D-File *. • Check the variables you are interested in . • Sort your data according to key variables used für linkage *. • Names of key variables in files to be matched must identical => Create new key variables (ID010, ID020, ID_HH) in such a way that DB010 = ID010 DB020 = ID020 DB030 = ID_HH • Create a new file with only the key variables & the variable(s) you are interested in • name the new file DB100.sav

  17. SPSS–Matching: one-to-one • **** Before you start ************. * specify the path where the EU-SILC training dataset is stored. FILE HANDLE data_path / NAME='H:\wirth\DWB_TRAINING\SILC\DATA\'. * specify the path where you want to save your data. FILE HANDLE mydata_path /NAME='H:\wirth\DWB_TRAINING\SILC\EXERCISE_1\'. open the EU-SILC training dataset – D-File *. GET FILE='data_path/udb_c10d_silc_course.sav'. * check the variables you are interested in . cross DB020 by DB100.

  18. SPSS–Matching: one-to-one * open the EU-SILC training dataset – D-File *. GET FILE='data_path/udb_c10d_silc_course.sav'. * check the variables you are interested in . cross DB020 by DB100. * Step 1- Sort your data according to key variables used für linkage *. sort cases by DB010 DB020 DB030. * Step 2 - Names of key variables in files to be matched must identical *. rename variables (DB010 DB020 DB030 = ID010 ID020 ID_HH). * create a new file with the key variables & the variable(s) you are interested in *. save outfile = 'mydata_path/DB100.sav' /keep ID010 ID020 ID_HH DB100.

  19. SPSS–Matching: one-to-one GET FILE='data_path/udb_c10H_silc_course.sav'. sort cases HB010 HB020 HB030. * Key – Variables *. * either rename (like before) or better generate a new variable * STRING ID020 (A2). compute ID010 = HB010. compute ID020 = HB020. compute ID_HH = HB030. MATCH FILES FILE= * /file ='mydata_path/DB100.sav' /BY ID010 ID020 ID_HH. execute. * check whether it worked. cross HB020 by DB100.

  20. SPSS–Matching: One-to-many Match (1:n) Example 2: Combing household and personal data E.g. ‘Degree of Urbanisation’ (DB100) to personal data. GET FILE='data_path/udb_c10p_silc_course.sav'. * Sort key variables used für linkage *. sort cases by PB010 PB020 PX030. * PB020 = string variable - create a new string variable ID020 /or use the rename command * STRING ID020 (A2). compute ID010 = PB010. compute ID020 = PB020. compute ID_HH = PX030.

  21. SPSS–Matching: One-to-many Match (1:n) MATCH FILES FILE= * /table = 'mydata_path/DB100.sav' /BY ID010 ID020 ID_HH. execute. * Check whether it worked *. cross pb020 by db100. save outfile = 'mydata_path/personal_data.sav'.

  22. Matching: many-to-one (n : 1) • Create new summary variables for personal data (P-File) • number of persons living in the same household • number of unemployed persons living in a household • number of full-time employed persons living in a household • number of part-time employed persons living in a household • number of self-employed persons living in a household • sum of ‘pensions from individual private plans (PY080G)

  23. *********************************************************. * many-to-one (n:1) * Personal Data * example 1 * number of persons living in the same household * number of unemployed persons living in a household *********************************************************. * specify the path where the EU-SILC training dataset is stored. FILE HANDLE data_path / NAME='H:\wirth\DWB_TRAINING\SILC\DATA\'. * specify the path where you want to save your data. FILE HANDLE mydata_path / NAME='H:\wirth\DWB_TRAINING\SILC\EXERCISE_1\'. * open the EU-SILC training dataset. GET FILE='data_path/udb_c10p_silc_course.sav'.

More Related