100 likes | 119 Views
Learn key principles, error types, detection and correction methods, and conclusions for improving VAT data accuracy in turnover estimates. Evaluate, implement, and optimize strategies for identifying and rectifying anomalies in VAT records.
E N D
Editing and Imputing VAT Data for the Purpose of Producing Mixed-Source Turnover Estimates Hannah Finselbach and Daniel Lewis Office for National Statistics, UK
Overview • Key principles • Types of error in VAT Turnover data • Methods for detecting suspicious VAT Turnover • Methods for correcting suspicious VAT Turnover • Conclusions
Key principles • Understand how data are returned and processed • Maintain original data supplied by tax office • Keep audit trail for changes to data • Make good use of historical and register data • Automate process of detecting and correcting errors, and allow future improvements to this process
Types of error in VAT Turnover data • Suspicious individual turnover values • Unusually large or small turnover • Evaluate detection and correction methods using past data from survey to be used in mixed-source estimates • Unit errors • Systematic errors, e.g. Thousand or Million Pound (GBP) errors • Automatic correction • Suspicious quarterly reporting patterns • Described in Hoogland (2011) • Mark as suspicious before correcting
Methods for detecting suspicious VAT Turnover values • Extreme values in current period distribution • Extreme change in contribution to industry compared with previous period • Hidiroglou-Berthelot method • Transformed period on period ratio with influence measure • Combine methods 1 or 2 with influence measure
Evaluating methods for detecting suspicious VAT Turnover • Set parameters so that each method fails the same number of businesses • Check mean size of total turnover and employment for failing businesses • Estimate “false hits” based on comparison to short term (Monthly Business Survey) survey data • Businesses that fail but whose values are similar to those collected by survey
Results of evaluation for the MBS • Methods based on extremes in current distribution fail largest businesses • Lower estimated false hits for methods using previous period data • Best method: extreme change in contribution to industry compared to previous period
Options for dealing with suspicious VAT data • Remove from data set • Mark as suspicious in data set • Change values (impute) manually • Change values (impute) automatically • Evaluate by randomly creating suspicious values in “clean” data and comparing imputed values to original values (repeated simulation) • Best method is ratio imputation – ratio of means using data from the previous period
Implementation in ONS • Fine tune thresholds for detection method • examining distribution of period on period ratios for each VAT Turnover stagger and reporting pattern • Specify methods for IT system developers • Add variables and markers to Business Register • hold raw and cleaned VAT data • Flag suspicious values of VAT Turnover • Further research on: • Impact of using cleaned VAT Turnover data in Business Register processes • Required changes to selective editing for surveys producing mixed-source estimates
Conclusions • Important to clean VAT turnover data before use in short term mixed-source estimates • Need to understand how data are returned and processed • Methods based on previous period returns work best for detection and correction of suspicious values • Unit errors and suspicious patterns should be identified and corrected • Further work required to successfully implement mixed-source estimates in ONS