160 likes | 330 Views
2013 European Establishment Statistics Workshop. Integrating administrative and survey data in the new Italian system for SBS: quality issues. O. Luzi, F. Oropallo, A. Puggioni , M. Di Zio, R. Sanzo. Nurnberg , 9-11 September 2013. Outline. The new Italian system for SBS The sources
E N D
2013 European Establishment Statistics Workshop Integrating administrative and survey data in the new Italian system for SBS: quality issues O. Luzi, F. Oropallo, A. Puggioni, M. Di Zio, R. Sanzo Nurnberg, 9-11 September 2013
Outline • The new Italian system for SBS • The sources • Quality issues • Future activities 2
The new Italian system for SBS • New production process of a statistical database (frame) containing complete and coherent unit-level information to estimate key annual SBS (production value, turnover, intermediate costs, value added, wages, labor cost, profits) • based on the integrated use of administrative and fiscal data, as primary source of information,… • ….complemented by direct survey data (non covered sub-populations, or variables which are not directly available from external sources) • Release: October 2013 2
The administrative and fiscal data sources • FS: Financial Statements from Chambers of Commerce (about 700,000 units). • SS: Sector Studies survey(includes each year about 3.5 million enterprises) • Unico: Tax data, from the Ministry of Economy and Finance, based on a unified model of tax declarations by legal form • Social Security data,from the National Security Institute (INPS), which includes firm level data and individual (employees) data on wages and labor cost. The statistical sources • The Business Register (BR): about 4.5 million enterprises. Central role as reference list of the SBS target population • The Business Surveys (BS) • SME - Sample survey on Small/Medium Enterprises (1-99 persons employed) • LE - Total survey on Large Enterprises (100 and more persons employed ) 5
Assessing sources’ quality and usability (also based on recommendations from Essnetadmin data and BLUE-ETS) Coveragecompleteness of the source in terms of target population Completenessdegree to which the source includes information to estimate the target statistics Consistencyhow much admin definitions of variables/units are close to the SBS ones Accuracy: statistical adequacy of admin items for estimating target parameters Punctuality: relates to the delivery of source data along time Integrability: extent to which the source is capable of being integrated 8
Assessing sources’ quality and usability: consistency and accuracy • Let X* be the target SBS and Ai the related information from the source i • For each enterprise, some of the X* variables may exist in more than one sources in different combinations, according to the dimension, the social security rules, the fiscal status etc. • The variables Ai in source imay coincide or may approximate the corresponding X* • Objective of the analyses • Establish a hierarchy between the sources • Identify the possible deterministic “corrections” from Ai to X* • Identify the appropriate multivariate models so that x*ij= f(aij…..) 8
Assessing sources’ quality and usability • Assessingconsistency: closeness between the Aidefinitions in source iwith respect to the X* definitions • →variables harmonization • (ii) Assessingaccuracy: analysis (on overlapping admin and survey data) of Ai and X* distributions to assess possible “biases” • at micro-data and distribution level • at estimation level • and • selective editing • →further harmonization; data modeling 8
Assessing accuracy at micro-data and distribution level Example: quality Indicators for the main SBS variables (survey vs SS) 8
Assessing accuracy at estimation level Example: Value added: source and samplingeffects %diffEstimates (Yframe-YSME)/Yframe Source effect Samplingeffect 8
Assessing accuracy: selective editing • Selective editing allows for the identification of influential units at a given domain level based on pre-defined influence criteria (score functions) • Identification of possible lacks • Domains characterized by the highest amount of influential data • lacks in the harmonization process • Legal/administrative constraints • enterprises’ behavior related to the specific source objectives • →systematic discrepancies in sources’ information • Other sources of discrepancies • Measurement errors • Data variability • →random discrepancies in sources’ information 8
Assessing accuracy: identify “anomalies” in administrative data • Absolute and % number of units with influential discrepancies between SS and SME data which need correction, by source. k=0.05 8
Current activities • Improve the selective editing approach • Improve multivariate robust data modeling for • Imputation of non covered units • Modelingof either non coveredor notconsistentsource’svariables • Further analyses of data and economic indicators to improve accuracy for specific SME sub-populations • Improving relationships with data providers for ensuring sources punctuality and quality • Process documentation for future standardization 8
Expected benefits • Reduction of statistical burden and costs (medium term) • Data available at an extremely refined level of detail • Higher accuracy and coherence within and across statistical domains • Some drawbacks • Stability over time – data can be changed for internal decision of the owner administration • Initial costs for assessing sources’ quality and for ensuring sources’ usability 3