130 likes | 149 Views
IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS. Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia, Statistics Finland, ISTAT, Statistics Lithuania, ONS. Outline of the presentation. Scope of the project - use of admin data for STS
E N D
IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia, Statistics Finland, ISTAT, Statistics Lithuania, ONS
Outline of the presentation • Scope of the project - use of admin data for STS • Twosituations: a. VAT fairly complete and representative - VAT representative b. VAT not complete and not-representative - VAT not representative • VAT representative a. imputing missing values • Imputing missing values a. methods for imputations b. which units to impute • Conclusions and implications for other projects Imputing missing admin data for STS-estimates
Scope of the project Final situation: (after year) - all admin data are available for NSIs - data cover the population Monthly and quarterly estimates: Part of admin data are ‘missing’ L.E. (survey) L.E. (survey) admin data admin data Missing Assumption If admin data are complete, possible to use for statistics Challenge How to estimate for ‘ missing’ admin data in case of monthly and quarterly estimates Scope:turnover (VAT-registration), wages+employees(“social security data”) Imputing missing admin data for STS-estimates
Additional Value of ESSnet AdminData • VAT = ValueAddedTax • The European Union value added tax (EU VAT) is a value added tax encompassing member states in the European Union VAT area. Joining in this is compulsory for member states of the European Union. • Each Member State's national VAT legislation must comply with the provisions of EU VAT law as set out in Directive 2006/112/EC. • TRANSLATION TO STATISTCS • INPUT: Available VAT-information quite similar in Europe ! • OUTPUT: obligations also similar in Europe (STS, SBS. ESR regulations) • CONCLUSIONS ESSNET: methodological challenges in use of admin data indentical -> solution may differ, but only limited Imputing missing admin data for STS-estimates
Two situations Situation A: Situation B: L.E. (100 % sample) L.E (100 % sample) VAT Almost complete VAT Not available or very limited GENERAL SITUATION FOR Q; t+45days GENERAL SITUATION FOR M; t+30 days SITUATION A. or B. FOR OTHER ESTIMATES (Q-flash; M-T+45/50d) DIFFERS PER COUNTRY Imputing missing admin data for STS-estimates
MethodsSituation A: methodology QUALITY STS-ESTIMATES: Revision compared to final estimate 100 % sample Final situation Admindata average bias: STS 100 % sample average error: Admindata Missing SITUATION A: Admindata coverage almost complete ESTIMATION ONLY BASED ON ADMIN DATA SITUATION B: Admindata coverage incomplete ADMIN DATA = AUXILIARY INFORMATION sample sample L.E. VAT VAT VAT T-x SME ESTIMATION • establishedtechniques • Level estimates • Imputation of missing data (withavailable VAT) experimental meth. NOT DISCUSSED FURTHER
Methods for imputations • Analysed several production systems: • i.e. DE, F, “Nordic countries’, NL , I • Imputation of “missing VAT” based on: • Ot/Ot-1, Ot/Ot-12 of available VAT – or similar approaches • Stratification levels for calculation stratum imputations differ • from • NACE 2-digit x 2-size classes • to • NACE 4-digit x 9 size classes • KEY QUESTION: Do these different approaches lead to different output, because methods are generally applied when coverage of L.E. survey + available VAT exceeds 90 % of target variable ? Imputing missing admin data for STS-estimates
Methods for imputations– testing of different methodologies (example Estonia) Conclusion: Imputation method provide similar results if the population is fixed and VAT covers > 80 % of population Imputing missing admin data for STS-estimates
Comparing imputations with realisations (approach Statistics Finland) • Five imputation rules for current period at mico-level • Imputation rules automatically evaluated and compared by calculating maximum proportional forecast errors using data concerning the five latest months. The selection rules are: • An imputation rule < 20% maximum proportional forecast error and the same direction of change as in the last two months is automatically admissible; • The model with the smallest maximum error is considered best • Main difference with other detected practices: • No assumption; available VAT = representative • Not all missing data imputed (in practice 20 - 50 %) Mean annual change Geometric mean of monthly changes Previous turnover Mean turnover Turnover of comparison month Imputing missing admin data for STS-estimates
Comparing imputations with realisations(more precise conclusions) • Explanations: • - Outlier effect on calculated Ot/Ot-1 or Ot/Ot-12values • Late VAT-reporters are likely a selective group in countries with automatic fining systems in case of late VAT-reporting. • impact of selectivity on output is generally neglible due to high coverage available data Imputing missing admin data for STS-estimates
Which units to impute Imputing missing admin data for STS-estimates
Impact on resultsexample Italy uncert. provisional population imputation technique Conclusion: effect on revision caused by uncertainty of units to be imputed is larger than imputation technique itself Imputing missing admin data for STS-estimates 11
Conclusions • When using Admin Data for STS missing data are imputed • Most widely used imputation rules are: Ot/Ot-1 or Ot/Ot-12 • Taking into account large coverage of available data exact chosen imputation technique has only limited impact on outcome, despite the indication that the main assumption of the used techniques “available VAT = representative” might not be 100 % correct. • More important than the imputation technique = estimate for provisional population Imputing missing admin data for STS-estimates