1 / 20

European Conference on Quality 2008 in Official Statistics Session on Administrative data.

Quality Challenges in Processing Administrative Data to Produce Short-term Labour Cost Statistics. M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it) Istat - Italy. European Conference on Quality 2008 in Official Statistics Session on Administrative data. Rome, 8–11 July 2008.

kail
Download Presentation

European Conference on Quality 2008 in Official Statistics Session on Administrative data.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quality Challenges in Processing Administrative Data to Produce Short-term Labour Cost Statistics M. Carla Congia, Silvia Pacini, Donatella Tuzi (tuzi@istat.it) Istat - Italy European Conference on Quality 2008 in Official Statistics Session on Administrative data. Rome, 8–11 July 2008

  2. Administrative data Session Presentation Outlines • The Italian Oros Survey • The peculiarities of the administrative source used • The quality strategy in a context of timely and extensive use of administrative data • Final remarks Q2008. Rome, 8-11 July 2008

  3. Administrative data Session The Oros Survey Since 2003 the Italian NSI has released quarterly indicators on gross wages and total labour cost (Oros Survey) covering all size enterprises in the private non-agricultural sector. Indices are released 70 days after the end of the reference quarter. In the past this information was monthly collected only for large firms through the Survey on Large Enterprises (> 500 employees). The Oros Survey was planned to fill this gap in the Italian statistics, using administrative data (employees’ social contribution declarations to the National Social Security Institute - INPS) for Small and Medium Enterprises, integrated with the survey data on Large Enterprises (LES). Nowadays, in Italy the Oros Survey is an innovative example of administrative data extensively used to produce timely business statistics Q2008. Rome, 8-11 July 2008

  4. Administrative data Session The Administrative Sources All Italian non-agricultural firms in the private sector, with at least one employee (roughly 12 million employees and 1.3 million employers per year) have to pay monthly social security contributions to INPS. INPS administrative register (AR) Contains structural information for each administrative unit (administrative id., fiscal code, name, legal form, dates of registration and cancellation, etc.). About 4 million records each quarter. Transmitted at the end of the reference quarter. Employers monthly declaration (DM10 form) Highly detailed grid organized in administrative codes with information on employment by type, paid days, wage bills, social contributions, credit terms and tax relieves. Each DM10 lays in more records (on average 8 records per unit). About 10 million records each month. Transmitted 35 days after the end of the reference quarter. Q2008. Rome, 8-11 July 2008

  5. Administrative data Session Peculiarities of the Administrative Source Differently from Survey data, the use of an administrative source: • reduces the financial costs of a direct collection and avoids further response burden on enterprises; • satisfies the growing demand for timely and detailed statistical information, for multiple statistical aims. Yet, data collection is beyond the NSI control (that needs information about the quality of the administrative data used). Strict relationships and coordination with the administrative institutions help to reduce the risks to incur in data quality problems due to the dependence from the data supplier. In this, the Oros Survey does not differ from other register-based statistics. Q2008. Rome, 8-11 July 2008

  6. Administrative data Session Peculiarities of the Administrative Source (2) What makes the Oros Survey peculiar with respect to otherregister-based statistics is its release timeliness, that obliged Istat to acquire data without any previous check and aggregation (completely raw). Unusual statistical quality aspects are implied: • the processing of a huge quantity of complex data in a very short time; • the lack of standardized metadata to translate administrative information; • the continuous changes of administrative definitions and concepts. The acquisition of raw information allows Istat to monitor most of the processing aspects, but an hard work is needed to guarantee a high standard of quality. A pervasive strategy of quality has been implemented, covering the whole Oros production process. Q2008. Rome, 8-11 July 2008

  7. DM10 micro data Metadata Database Preliminary checks and retrieval of the statistical variables Treatment of measurement errors (micro editing) Administrative Register (AR) Treatment of non-response errors (imputation of temporary employment agencies) The large firms: integration with survey data Checks on macro data Oros Survey indicators Administrative data Session The Quality Strategy in the Oros Production Process Q2008. Rome, 8-11 July 2008

  8. Administrative data Session The Administrative Register • The AR is used as a representation of the current population. • But: • it suffers of over-coverage problems (temporary suspensions and firm closures are under-recorded); • the economic activity code is drawn from the Italian Business Register (BR) (90% of the Oros active units); • hard work to outline the estimation frame (exclusion of units not belonging to the Oros target population); • special attention to the quality of the fiscal code as leading matching variable. Q2008. Rome, 8-11 July 2008

  9. Administrative data Session Preliminary Checks and Retrieval of the Statistical Variables Meta-information on laws, regulations, contribution rates, codes and other technical aspects of Social Security is timely collected and updated in a standardized METADATA DATABASE in-house built. It is necessary to carry out: • preliminary checks on raw data and correction of errors on codes, record duplications, incoherencies with current legislation; • translation of the administrative data into statistical variables, through complex additions and subtractions of a huge number of wage and contribution items identified by numerous administrative codes (actually more than 5,000); • estimation of some components for which information is not available in the administrative form (e.g. Employers’ injuries insurance premium and severance payment). In this step each DM10 is reorganized in 1 record. Q2008. Rome, 8-11 July 2008

  10. Administrative data Session Treatment of Measurement Errors Once statistical data have been made available a more traditional micro editing procedure is set up…but… …given the huge number of units, it is strongly based on selective criteria. A score function assigns to each of the 1.3 million of units the probability that an error occurs in the target variables. Cut-off thresholds are fixed to select anomalous values, but their identification is deeply affected by the significant tails in the distribution of the target variables: • very low per capita wages (e.g. units with only supplementary earnings); • negative per capita other labour costs (e.g. social contribution rebates). Q2008. Rome, 8-11 July 2008

  11. 15.0 12.5 10.0 % 7.5 5.0 2.5 0 -1,350 -975 -600 -225 150 525 900 1,275 1,650 2,025 2,400 2,775 3,150 3,525 3,900 4,275 4,650 5,025 5,400 5,775 Per capita other labour costs Figure 1 – Distribution of the per capita other labour costs (euro values) in the Oros manufacturing small and medium enterprises – July 2007 - Mean= 450 Median= 430 Max= 6,900 Min= -1,350

  12. Administrative data Session Treatment of Measurement Errors (2) The edit and imputation rules are based on known functional relations among the analyzed variables and are aimed at evaluating and keeping at unit record level both cross-sectional and longitudinal consistency using information on the closest months. The number of monthly edits is generally not high but even an oversight may have a significant effect. Quarterly changes of the Oros wage index in the Wholesale and retail trade sector (G) – In the third quarter 2007, the number of employees of a unit was affected by a measurement error: part time workers 73,000. Imputed data: 2. Would have implied a change of 0.8% instead of 3%. This step is mainly interactive. Given the nature of data, by experience automatic corrections are avoided Q2008. Rome, 8-11 July 2008

  13. Administrative data Session Treatment of Non-response Errors In the Oros Survey non-responses are units delivering the DM10 with a delay. Nevertheless, almost the 95-98% of the Oros population is represented by the preliminary administrative data. Given the tested MAR nature of the missing units and their limited number in the preliminary data, they do not significantly affect the Oros wage and other labour cost changes. Units referred to Temporary Employment Agencies (TEA) are an exception, because of their strong characterization. About 100 units accounting for the 3% of total employment in the private sector (20% in sector K - Real estate, renting and business activities). The absence of even few of these units may significantly impact on changes of the per capita indicators Q2008. Rome, 8-11 July 2008

  14. Administrative data Session Treatment of Non-response Errors (2) • The single out of TEA unit non-responses is not an easy task: • the population under study is represented by the current AR which suffers of over-coverage problems (a list of respondents is not available). It follows that the unit active status must be predicted, through a longitudinal analysis of the unit activity in the nearby quarters; • given the strong dynamic nature of TEA, an hard work is necessary to follow their frequent changes (e.g. mergers, split-ups, etc.) over time to separate real non-responses from non-active units. Imputation of missing data is deterministic and widely based on the use of pastinformation on non-respondents and panel information on the current respondents. Q2008. Rome, 8-11 July 2008

  15. Administrative data Session Integration with Survey Data on Large Enterprises In the Oros estimates a special attention is given to Large Enterprises (firms with more than 500 employees - LE). In the Italian non-agricultural sector LE account for about 1000 units employing 2 million workers. • In the past integration of survey data on LE was strongly motivated by a non-significantrepresentation of these units in the preliminary administrative data. • Nowadays the INPS source guarantees a good coverage of these units but, as experience has suggested, the use of the statistical source provides higher quality data: • enterprise recalling in case of non-responses or suspected measurement errors; • more rapid and efficient management of the frequent legal changes these units are subjected to (e.g. mergers, split-ups, acquisitions etc.). Q2008. Rome, 8-11 July 2008

  16. Administrative data Session Integration with Survey Data on Large Enterprises (2) • Combining Survey and administrative data, specific quality aspects are involved : • harmonisation of variables; • record matching: the fiscal code is the main linking variable, but ambiguities may happen because of formal errors or different updating time in the two sources (mergers, hive-offs, split-ups might be recorded in several periods). Big efforts are aimed at avoiding omissions and duplications, using supplementary information (legal name, number of employees etc.). About 12% of LES employment is manually reviewed and matched to the correspondent administrative firms. Q2008. Rome, 8-11 July 2008

  17. Administrative data Session Checks on Macro Data Final checks on macro data are a key step in the quality target to identify possible residual errors that may affect the estimates. These checks are mainly based on: • analytic and graphical inspection of the time series at a sub-population detail: acceptance boundaries must be respected by pre-defined statistical measures; • automatic detection of outliers based on TERROR, an application of the software TRAMO-SEATS, where the detection of suspected errors is based on REG-ARIMA modelestimates; • comparison with other statistical source figures (e.g. National Accounts, Indices of wages according to collective agreements, etc.); • variable relationships, whose coherence has to be guaranteed (e.g. the ratio of other labor costs on wages, etc.). If any error is detected, a drill-down to micro data may be necessary Q2008. Rome, 8-11 July 2008

  18. Administrative data Session Internal Oros Quality Reporting • The quarterly documentation and updating of the Oros production process is a fundamental task in the general strategy of quality: • metadata are archived; • methodological information is documented; • imputed data are flagged (and pre-imputation data are archived); • quality indicators on the impact of imputation are calculated. The documentation of the Oros process guarantees its reproducibility and repeatability Q2008. Rome, 8-11 July 2008

  19. Administrative data Session Final Remarks • The Oros Survey was: • developed with any previous experience in the use of administrative data for the production of short term official statistics; • gradually implemented learning by doing. • High timeliness,frequent changes in Social Security laws and regulations and strongly detailed raw data imply relevant and unusual quality problems managed through: • strict relationships and coordination with the administrative institution; • pervasivequality strategy along the whole production process; • highly skilled human resources to handle the wide and non-conventional processing aspects, subjected to frequent modifications; • systematic documentation of the production steps. Less “standardizable” than a traditional survey quality strategy? Q2008. Rome, 8-11 July 2008

  20. Administrative data Session References Baldi C., Ceccato F., Cimino E., Congia M.C., Pacini S., Rapiti F., Tuzi D. (2004) Use of Administrative Data to produce Short Term Statistics on Employment, Wages and Labour Cost. Essays, n.15/2004, Istat, Rome. Caporello G., Maravall A. (2002) A tool for quality control of time series data. Program TERROR. Bank of Spain. Eurostat (2003) Quality assessment of administrative data for statistical purposes. Doc. Eurostat/A4/Quality/03/item6, available on the web site: http://epp.eurostat.ec.europa.eu/pls/portal/docs/PAGE/PGP_DS_QUALITY/TAB47141301/DEFINITION_2.PDF Istat, CBS, SFSO, Eurostat (2007) Recommended Practices for Editing and Imputation in Cross-Sectional Business Surveys, available on the web site: http://edimbus.istat.it/dokeos/document/document.php?openDir=%2FRPM_EDIMBUS Thank you for your attention Donatella Tuzi tuzi@istat.it Q2008. Rome, 8-11 July 2008

More Related