The INPS Data Archive at fRDB - Universita’ Bocconi

The INPS Data Archive atfRDB - Universita’ Bocconi Orietta Dessy (Universita’ Bocconi, fRBD & Dondena) 26th October 2006

Structure of the presentation • INPS archives and our sample: differences with WHIP • Possible matches & problems • The first release: demographic and employees’ archives. • Variables’ description. • Next releases and access to data

INPS archives and our sample • The Italian National Social Security Institute (Istituto Nazionale di Previdenza Sociale – INPS) collects workers’ contributions for a number of social security benefits: pensions, unemployment benefits, family bonuses, … • Contributions are compulsory for firms and for any of their registered (regular) employees.

INPS archives and our sample: differences with WHIP • Our sample: 4 birth-dates in a year (the 10th of 4 months) for a sample 1: 90. The same as WHIP. • WHIP is a pre-constructed panel: an unique individual identifier has been constructed by researchers at Laboratorio Revelli according to subjective criteria (+: ready-to-be-used for researchers; -: very rigid) • Our data try to be very close to the raw data: cleaning procedure has had the purpose to give all the possible tools for researchers to be able to construct easily their own panel (-: some work still needed to construct the panel/matching; +: extremely flexible) • Different conditions for accessing the data

INPS archives

Possible matches & problems Many possibilities: • Merge files within-archive • Merge files between-archives • Cross sections • Panels

Matching problems • Each PID is intended as ‘contributive position’ and not as an individual. Possible to have more than 1 obs. for each PID in each file => need to choose 1 or compact somehow all the observations for the same group of PID for merging files. (useful command in Stata duplicates) • Same problem on the firms’ side: need to reconstruct the economic concept of firm

The first release • The demographic archive • Employees’ archive 1985-2002 • User manual, with description of variables, codebook and year-by-year tables reporting n.obs., % missing.

Demographic archive • Contains all the PIDs that have been at least in one of any of the archives • Adds demographic information on individuals to each single archive, whenever not existing already in the files • N. obs: 945.576 • Variables: year birth, sex, prov./country of birth, year death. Year-by year residenza since 1997.

Problems encountered in cleaning the demographic archive • Duplication of individuals: different PIDs can belong to the same person =>problem solved in a non-probabilistic environment (routine in Stata is being generalised), using information on old Fiscal Code and old INPS code available at INPS. • Possible improvements in a probabilistic environment, taking into account similarities and spell errors in Name, Surname, Address (correct mistakes requires international vocabulary, very expensive, that INPS is buying)

Problems encountered in cleaning the demographic archive • Estimated impact of further duplications very low. Eventually, the demographic archive will be updated. • Important note: the demographic archive is updated to 2005. Therefore, demographic information sometimes goes further than the years covered by an archive. • When demographic information is included in an archive, checks of coherences have been carried out. They are negligible (0% always)

Problems encountered in cleaning the demographic archive • It might be that a PID in a certain archive is not found in the Demographic archive. • Not clear explanation for that, probably this depends from the fact that Demographic archive has been updated to 2005 • Suggestion: keep individuals if no additional demographic information is needed, otherwise just exclude them.

Employees’ archive

Employees’ archive • Note: it is individuals that are sampled, not firms. Firms in the sample are those encountered by individuals in their job history. • Not many individuals for the same firm

Employees’ archive: variables • PID & FID • Some qualitative variables on how data have been reported, but not reliable • Employment: provlav, skill (white collar, blue collar, executives, CEOs, apprentices, and a few more), part-time/full-time, since 1998 also duration of contract (fixed term, permanent, seasonal).

Employees’ archive: variables • Income: truncated in thousands when reported in Lire, strong reporting errors in 1998, since 2000 reported both in Lire and in Euro. • N. weeks paid, n. days paid, months paid in the year (not only the number, but flag on months)

Employees’ archive: variables • Institutional variables: csc (convertible in NACE, table given) is the best variable for determining the sector of activity, code of contract (regional, prov., firm-level) • Job classification (Inquadramento): available, but many missing and categories do not match directly usual classifications of national contracts

Employees’ archive: variables • Severance indemnity (Trattamento di fine rapporto-TFR): amount due to employee from the end-of-service found • Coordinates for family-related allowances • Up to 4 special income, for workers with particular job-earnings: start and end of pay, compensation, n. weeks paid • Reduced pay: illness, maternity, special lay-off pay fund (cig), others.

Next releases • Job-histories archive • Atypical work archive (parasubordinati) • Pensions • Others • Firms’ archive

Access to data • Free for all the affiliates to fRDB-Universita’ Bocconi • Follow instructions on the web. • Password & email needed because the archive will be continuously updated. • Possible to share routines and programs for constructing most used panels.

The INPS Data Archive at fRDB - Universita’ Bocconi