550 likes | 773 Views
Part II – Introduction to SILC Data Structure and Documentation. DwB Training Course on EU-SILC Longitudinal data Paris, 19-21 February 2014 Heike Wirth. Aims of this session. Introduce the rotational design Explain the concept of the selected respondent
E N D
Part II – Introductionto SILC Data StructureandDocumentation DwB Training Course on EU-SILC Longitudinal data Paris, 19-21 February 2014 Heike Wirth
Aims of thissession • Introduce the rotational design • Explain the concept of the selectedrespondent • Explain the organisation of the data • Point out somereading: Documents of priority
Rotational design - Illustration 2006 Initial sample
Rotational design – Illustration longitudinal 2006 e.g. longitudinal data 2011
Rotational design – empirical Not equivalent to the number of years of participation
Rotational design – empirical tab DB075 HHYNR HHYNR (numberofhh-year) HHYNR(= number of household year) is not included in the data, must be created Source: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations
Rotational design - empirical tab HHYNR YEAR HHYNR (numberofhh-year) HHYNR(= number of household year) is not included in the data, must be created Source: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations
Rotational design - empirical tab HHYCOUNT HHYNR HHYNR HHYCOUNT HHYCOUNT (= count of household-years) is not included in the data, must be created Source: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations
Example: PH030- Limitation in activities because of health problems (register countries) (mainly) not selected respondents (see PH030_F) Source: UDB_l11P_ver 2011-1 from 01-08-2013.dta
Organisation of the data EU-SILC consists of 4 separate files for the cross-sectional data Household Register FILE Household Data FILE Personal Register FILE Personal Data FILE
Organisation of the data … and of 4 separate data files for the longitudinal data Household Register FILE Household Data FILE Personal Register FILE Personal Data FILE
Household Files- longitudinal • Household Register • D-File • Household Data • H-File • Includes every selectedhousehold(also those where the address could not be contacted or which could not be interviewed) • > 19 variables: household identifier, sampling design information, region • Only households which have been contacted and completed a hh interview andat least one hh member has complete data in the personal data file • > 180 variables (incl. flag-variables & imputation-factors): basic data, social exclusion, income, housing • UDB_l11D_ver 2011-1 from 01-08-2013: N = 542 942 households • UDB_l11H_ver 2011-1 from 01-08-2013: N = 411 189 households
Personal Files - longitudinal • Personal Register • R-File • Personal Data • P-File • Only reference population (persons aged 16 and over) and only persons for whom the information could be completed by interview (personal/proxy) and/or register • > 190 variables (incl. flag variables & imputation factors): e.g. demographic, income, work and unemployment • Every person currently living in hh or temporarily absent. • Longitudinal file: also persons registered in the R-File of the previous year or living at least 3 months in the hh during the income reference period. • > 50 variables (incl. flag variables): basic information e.g. relationship between household members • UDB_l11P_ver 2011-1 from 01-08-2013; N= 879,720 persons • UDB_l11R_ver 2011-1 from 01-08-2013 N=1,079,261 persons
Depending on the research question: Use of separate datasets Household Register Personal Register Personal Data Household Data
…. or a combination of different datasets Household Register Personal Register Personal Data Household Data
Household Register Household Register • Personal • Register Personal Register Organisation of the data While for both, c-s and longitudinal data all 4 files are linkable among each other, c-s and longitudinal data are not linkable • Household • Data Household Data Personal Data • Personal • Data longitudinal data cross-sectional data
HH • Register • HH • Register • Personal • Register • Personal • Register Organisation of the data … as well as cross-sectional data are not linkable over time (HH-ID and related identifaction variables are randomized) • HH • Data • hh • Data • Personal • Data • Personal • Data t t+1
Organisation of the data… combine different datasets – Key Variables • In order to link (combine) the four files D, H, R and P among each others all observations must have a unique link to the respective three other files This link is achieved by the following 4 key variables (1) Year of Survey (2) Country (3) Household ID (4) Personal ID
Organisation of the data… combine different datasets – Key Variables Household Register Personal Register Personal Data Household Data Year of Survey Country Household ID Personal ID Year of Survey Country Household ID Year of Survey Country Household ID
Organisation of the data Household ID – Personal ID • Household ID • Cross-sectional (max. 6 digits) = hh number 1-999999 • Longitudinal (max. 8 digits) = hh number 1-999999 + split number • Default split number = 00 • Personal ID • Cross-sectional = hh-id + personal number (max 2 digits) • Longitudinal = hh number + default split number (00) + personal number • In the longitudinal survey the Personal ID never changes, even if the person moves to a different household • in the cross-sectional survey, from year to year the Household ID and Personal ID may change
Combining information from two separate files at a 1:1 level
Use of separate sub datasets Create household level variables from personal level data, e.g. • number of current household members • persons < 18 in household • age of the youngest child in household • Number of unemployed hh-members • Highest educational level in household • …
Create new household level summary variables from person level information, e.g. household size, number of children, age of youngest child (< 18 years)
Some reading – Documents of priority Guidelines_Doc65_2011.pdf • General technical information on sample design, weights, etc. • List of all variables included in the original EU-SILC data base • Description of (cross-sectionaland longitudinal) variables DIFFERENCES BETWEEN DATA COLLECTED AND UDB.doc • List of variables removed or added to UserdataBase (UDB) • Methods of anonymisation SILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls National and EU Quality reports • http://epp.eurostat.ec.europa.eu/portal/page/portal/income_social_inclusion_living_conditions/quality
Some reading – Documents of priorityGuidelines_Doc65_2011.pdf Source: Guidelines_Doc65_2011.pdf
Some reading – Documents of priority Flag Variable HH020_F Source: Guidelines_Doc65_2011.pdf
Some reading – Documents of priority Flag Variable HH021_F Source: Guidelines_Doc65_2011.pdf
Some reading – Documents of priorityCross-sectional data 2011 Source: UDB_c11H_ver 2011-2 from 01-08-13.dta
Some reading – Documents of priorityLongitudinal data 2011 New (HH021) Old (HH020) Source: UDB_l11H_ver 2011-1 from 01-08-2013.dta
Some reading – Documents of priorityExample: variable included in the cross-sectionaland longitudinal data Source: Guidelines_Doc65_2011.pdf
Some reading – Documents of priorityExample: variable included in the cross-sectionalonly Source: Guidelines_Doc65_2011.pdf
Some reading – Documents of priorityExample: variable included in longitudinal dataonly Source: Guidelines_Doc65_2011.pdf
Some reading – Documents of priorityExample: selectedrespondent Source: Guidelines_Doc65_2011.pdf
Some reading – Documents of priorityDifferencesbetweendatacollectedandUserdata Base (cross-sectionalfile)
Some reading – Documents of priorityDifferencesbetweendatacollectedandUserdata Base (longitudinal file) Source: L2011 DIFFERENCES BETWEEN DATA COLLECTED AND UDB.doc
Some reading – Documents of priorityDifferencesbetweendatacollectedandUserdata Base (cross-sectionalfile)
Some reading – Documents of priorityDifferencesbetweendatacollectedandUserdata Base (longitudinal file)
Some reading – Documents of prioritySILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls Source: SILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls