290 likes | 470 Views
Methods for detecting errors in VAT Turnover data. Phil Lewis Processing, Editing and Imputation branch Business Statistics Methods-Survey Methodology E-mail: philip.a.lewis@ons.gov.uk. Outline of talk. Detecting suspicious patterns Methods for detecting unit errors Consider 5 methods
E N D
Methods for detecting errors in VAT Turnover data Phil Lewis Processing, Editing and Imputation branch Business Statistics Methods-Survey Methodology E-mail: philip.a.lewis@ons.gov.uk
Outline of talk • Detecting suspicious patterns • Methods for detecting unit errors • Consider 5 methods • Comparing methods • Results • Conclusion and recommendations
Detecting suspicious patterns • One of the problems with VAT Turnover data is that it is often not possible to re-contact businesses to get an idea of their true Turnover figure. • It is often possible to identify errors in VAT Turnover data by considering the pattern of reported Turnover over a period.
Hoogland (2010) • Zero Turnover in three quarters, positive Turnover in the other quarter • Zero Turnover in one quarter, positive Turnover in the other three quarters • Same Turnover in all four quarters • Same Turnover for three quarters, a different (positive) Turnover value in the other quarter • Negative Turnover in any of the quarters
Methods for detecting unit errors in reported VAT Turnover If then assume the current VAT Turnover has been reported in thousands of pounds and multiply by 1000 to get a figure in pounds.
1 – Quartile distances in industry Turnover • Based on a method described in Hoogland and Van Haren (2007) to identify unusually large or small Turnover by locating extreme values in the distribution of VAT Turnover within a particular industry and size class.
Suspicious Turnover is identified as follows. If Turnover > Q3 + [C × (Q3 – Median)] or Turnover < Q1 – [C × (Median – Q1)] • C may be given different values for different industry and size classes.
2 – Period on period ratios • Method 2 comes from De Jong (2003) and involves calculating period on period ratios for each business based on the contribution that business’s Turnover makes to its class. • For each business calculate:
Then calculate Where is the value of Score in period t.
3 – Comparison with reporting history for the business • The method is described in slightly different forms in Hoogland and Van Haren (2007), Lorenz (2010) and Röstel (2010). • Note that this method only identifies suspiciously large Turnover.
If Turnover > £100 million and Turnover > 10 × mean Turnover for the business in the past 24 months. Then treat as suspicious.
4 – Quartile differences combined with measure of influence • Refinement to method 1, inspired by Hoogland et al (2009). • Calculate the influence as the proportion of VAT Turnover the business contributes to the total VAT Turnover in the industry and size class. • Combine detection of suspicious values using quartile differences with the influence.
Identify unusual Turnover values using the quartile distances measure described in method 1. • Reminder method 1 Suspicious Turnover: Turnover > Q3 + [C × (Q3 – Median)] or Turnover < Q1 – [C × (Median – Q1)]
Then for each business calculate • This method effectively subsets businesses failing the quartile distance method, so that only the most influential are viewed as being suspicious.
5 – Hidiroglou-Berthelot method • Compare to previous period’s value: • Form the ratio r • = current VAT turnover / previous VAT turnover • Transform the ratio • if r < m median then t = (r - m) / r • otherwise t = (r - m) / m • Define • E = t x max { current VAT T/O, previous VAT T/O }v
Then calculate Suspicious businesses are then identified as follows: If or
A key difference between survey and administrative data is that with administrative data it is often not possible to re-contact the business and ask them to confirm any suspicious values. • Evaluation of detection methods is not straightforward and cannot usually be definitive.
Comparing methods • Diagnostics include the proportion of businesses identified as suspicious within each industry and size class and the average size (employment) and VAT Turnover of suspicious businesses compared with the rest of the class.
Results of testing detection methods with VAT data • If businesses with larger Turnover values are of more importance: method 4 (Quartile differences &influence) and method 5 (Hidiroglou-Berthelot) offer the flexibility to give higher weight to those businesses.
Good quality historic data available then: method 2 (Period on period ratios) and method 3 (Comparison with history ) likely to give good results.
Method 1 (Quartile differences) and the related method 4 (Quartile differences &influence) should be effective in identifying extreme values when only the current period data are available.
Conclusion and recommendations • Each of these methods uses parameters which can be fine-tuned to identify an appropriate number of suspicious businesses. • The effective values of these parameters are likely to differ between data sources. Therefore, rather than prescribe specific values, it is recommended that the parameters are set through analysis of the effect of the method on the VAT data under consideration.
Before applying any detection methods • Suspicious patterns. It is recommended that VAT data are checked for these patterns before implementing any other error detection method. • Unit errors: relatively easy to identify and correct. It is recommended that an automatic method is developed to detect and correct any unit errors in VAT Turnover data, before applying any other rules.
The final recommendation is that in developing methods for detecting errors in VAT Turnover data, it is always useful to understand the data source and the possible errors that may be found in it. • In many cases, it will be necessary to liaise with the data providers to get this information.
References: • De Jong, A. "Impect: Recent developments in harmonized processing and selective editing", Proceedings of UNECE Work Session on Statistical Data Editing, Madrid, October 2003: Web. • Hidiroglou, M. A. and Berthelot, J.-M. “Statistical Editing and Imputation for Periodic Business Surveys”, Survey Methodology, June 1986, Vol. 12, No. 1, pp 73-83: Journal. • Hoogland, J. "Editing strategies for VAT data", Seminar on 'Using administrative data in the Production of Business Statistics - Member States experiences', Rome, March 2010: Web. • Hoogland, J. and Van Haren, G. "Editing and integrating VAT and SBS data", Proceedings of the third International Conference on Establishment Surveys (ICES-III), Montreal, June 2007: CD ROM.
References: • Hoogland, J., Van Bemmel, K. and De Wolf, P-P. "Detection of potential influential errors in VAT turnover data used for short term statistics", Proceedings of UNECE Work Session on Statistical Data Editing, Neuchatel, October 2009: Web. • Lorenz, R. "The integrated system of editing administrative data for STS in Germany", Seminar on 'Using administrative data in the Production of Business Statistics - Member States experiences', Rome, March 2010: Web. • Seyb, A., Stewart, J., Chiang, G., Tinkler, I., Kupferman, L., Cox, V. and Allan, D. "Automated editing and imputation system for administrative financial data in New Zealand", Proceedings of UNECE Work Session on Statistical Data Editing, Neuchatel, October 2009: Web.
Extra information • For method 2, we used a threshold of 25 as a compromise between the monthly and quarterly data. • For method 3, we used the thresholds described in Hoogland and Van Haren (2007). • For method 5, the Hidiroglou-Berthelot rule, we used a value of V = 1 to give extra weight to businesses with larger Turnover, as this has been shown to work well with business data in the past. The value of C for this method was 250. • Method 1 used a value of C = 10 in the quartile method to give the same proportion of failures. • For method 4 we chose a value of C = 8 in the quartile method and then prioritised the businesses failing that method by VAT Turnover to give a similar proportion of failures as methods 1 and 5.