1 / 16

Integrating administrative and survey data in the new Italian system for SBS: quality issues

2013 European Establishment Statistics Workshop. Integrating administrative and survey data in the new Italian system for SBS: quality issues. O. Luzi, F. Oropallo, A. Puggioni , M. Di Zio, R. Sanzo. Nurnberg , 9-11 September 2013. Outline. The new Italian system for SBS The sources

teo
Download Presentation

Integrating administrative and survey data in the new Italian system for SBS: quality issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2013 European Establishment Statistics Workshop Integrating administrative and survey data in the new Italian system for SBS: quality issues O. Luzi, F. Oropallo, A. Puggioni, M. Di Zio, R. Sanzo Nurnberg, 9-11 September 2013

  2. Outline • The new Italian system for SBS • The sources • Quality issues • Future activities 2

  3. The new Italian system for SBS • New production process of a statistical database (frame) containing complete and coherent unit-level information to estimate key annual SBS (production value, turnover, intermediate costs, value added, wages, labor cost, profits) • based on the integrated use of administrative and fiscal data, as primary source of information,… • ….complemented by direct survey data (non covered sub-populations, or variables which are not directly available from external sources) • Release: October 2013 2

  4. The overall strategy 7

  5. The administrative and fiscal data sources • FS: Financial Statements from Chambers of Commerce (about 700,000 units). • SS: Sector Studies survey(includes each year about 3.5 million enterprises) • Unico: Tax data, from the Ministry of Economy and Finance, based on a unified model of tax declarations by legal form • Social Security data,from the National Security Institute (INPS), which includes firm level data and individual (employees) data on wages and labor cost. The statistical sources • The Business Register (BR): about 4.5 million enterprises. Central role as reference list of the SBS target population • The Business Surveys (BS) • SME - Sample survey on Small/Medium Enterprises (1-99 persons employed) • LE - Total survey on Large Enterprises (100 and more persons employed ) 5

  6. The micro-data matrix X* 7

  7. Sources’ coverage 6

  8. Assessing sources’ quality and usability (also based on recommendations from Essnetadmin data and BLUE-ETS) Coveragecompleteness of the source in terms of target population Completenessdegree to which the source includes information to estimate the target statistics Consistencyhow much admin definitions of variables/units are close to the SBS ones Accuracy: statistical adequacy of admin items for estimating target parameters Punctuality: relates to the delivery of source data along time Integrability: extent to which the source is capable of being integrated 8

  9. Assessing sources’ quality and usability: consistency and accuracy • Let X* be the target SBS and Ai the related information from the source i • For each enterprise, some of the X* variables may exist in more than one sources in different combinations, according to the dimension, the social security rules, the fiscal status etc. • The variables Ai in source imay coincide or may approximate the corresponding X* • Objective of the analyses • Establish a hierarchy between the sources • Identify the possible deterministic “corrections” from Ai to X* • Identify the appropriate multivariate models so that x*ij= f(aij…..) 8

  10. Assessing sources’ quality and usability • Assessingconsistency: closeness between the Aidefinitions in source iwith respect to the X* definitions • →variables harmonization • (ii) Assessingaccuracy: analysis (on overlapping admin and survey data) of Ai and X* distributions to assess possible “biases” • at micro-data and distribution level • at estimation level • and • selective editing • →further harmonization; data modeling 8

  11. Assessing accuracy at micro-data and distribution level Example: quality Indicators for the main SBS variables (survey vs SS) 8

  12. Assessing accuracy at estimation level Example: Value added: source and samplingeffects %diffEstimates (Yframe-YSME)/Yframe Source effect Samplingeffect 8

  13. Assessing accuracy: selective editing • Selective editing allows for the identification of influential units at a given domain level based on pre-defined influence criteria (score functions) • Identification of possible lacks • Domains characterized by the highest amount of influential data • lacks in the harmonization process • Legal/administrative constraints • enterprises’ behavior related to the specific source objectives • →systematic discrepancies in sources’ information • Other sources of discrepancies • Measurement errors • Data variability • →random discrepancies in sources’ information 8

  14. Assessing accuracy: identify “anomalies” in administrative data • Absolute and % number of units with influential discrepancies between SS and SME data which need correction, by source. k=0.05 8

  15. Current activities • Improve the selective editing approach • Improve multivariate robust data modeling for • Imputation of non covered units • Modelingof either non coveredor notconsistentsource’svariables • Further analyses of data and economic indicators to improve accuracy for specific SME sub-populations • Improving relationships with data providers for ensuring sources punctuality and quality • Process documentation for future standardization 8

  16. Expected benefits • Reduction of statistical burden and costs (medium term) • Data available at an extremely refined level of detail • Higher accuracy and coherence within and across statistical domains • Some drawbacks • Stability over time – data can be changed for internal decision of the owner administration • Initial costs for assessing sources’ quality and for ensuring sources’ usability 3

More Related