100 likes | 114 Views
Editing and Imputing VAT Data for the Purpose of Producing Mixed-Source Turnover Estimates. Hannah Finselbach and Daniel Lewis Office for National Statistics, UK. Overview. Key principles Types of error in VAT Turnover data Methods for detecting suspicious VAT Turnover
E N D
Editing and Imputing VAT Data for the Purpose of Producing Mixed-Source Turnover Estimates Hannah Finselbach and Daniel Lewis Office for National Statistics, UK
Overview • Key principles • Types of error in VAT Turnover data • Methods for detecting suspicious VAT Turnover • Methods for correcting suspicious VAT Turnover • Conclusions
Key principles • Understand how data are returned and processed • Maintain original data supplied by tax office • Keep audit trail for changes to data • Make good use of historical and register data • Automate process of detecting and correcting errors, and allow future improvements to this process
Types of error in VAT Turnover data • Suspicious individual turnover values • Unusually large or small turnover • Evaluate detection and correction methods using past data from survey to be used in mixed-source estimates • Unit errors • Systematic errors, e.g. Thousand or Million Pound (GBP) errors • Automatic correction • Suspicious quarterly reporting patterns • Described in Hoogland (2011) • Mark as suspicious before correcting
Methods for detecting suspicious VAT Turnover values • Extreme values in current period distribution • Extreme change in contribution to industry compared with previous period • Hidiroglou-Berthelot method • Transformed period on period ratio with influence measure • Combine methods 1 or 2 with influence measure
Evaluating methods for detecting suspicious VAT Turnover • Set parameters so that each method fails the same number of businesses • Check mean size of total turnover and employment for failing businesses • Estimate “false hits” based on comparison to short term (Monthly Business Survey) survey data • Businesses that fail but whose values are similar to those collected by survey
Results of evaluation for the MBS • Methods based on extremes in current distribution fail largest businesses • Lower estimated false hits for methods using previous period data • Best method: extreme change in contribution to industry compared to previous period
Options for dealing with suspicious VAT data • Remove from data set • Mark as suspicious in data set • Change values (impute) manually • Change values (impute) automatically • Evaluate by randomly creating suspicious values in “clean” data and comparing imputed values to original values (repeated simulation) • Best method is ratio imputation – ratio of means using data from the previous period
Implementation in ONS • Fine tune thresholds for detection method • examining distribution of period on period ratios for each VAT Turnover stagger and reporting pattern • Specify methods for IT system developers • Add variables and markers to Business Register • hold raw and cleaned VAT data • Flag suspicious values of VAT Turnover • Further research on: • Impact of using cleaned VAT Turnover data in Business Register processes • Required changes to selective editing for surveys producing mixed-source estimates
Conclusions • Important to clean VAT turnover data before use in short term mixed-source estimates • Need to understand how data are returned and processed • Methods based on previous period returns work best for detection and correction of suspicious values • Unit errors and suspicious patterns should be identified and corrected • Further work required to successfully implement mixed-source estimates in ONS