1 / 32

Using Predictive Analytics to help target Error and Fraud in Tax Credits

Using Predictive Analytics to help target Error and Fraud in Tax Credits. 1 st December 2011 I nternational Conference on Taxation Analysis and Research Rob James, KAI Benefits and Credits HMRC. What I am going to cover. 1. Some background to tax credits. 2.

allayna
Download Presentation

Using Predictive Analytics to help target Error and Fraud in Tax Credits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Predictive Analytics to help target Error and Fraud in Tax Credits 1st December 2011 International Conference on Taxation Analysis and Research Rob James, KAI Benefits and Credits HMRC

  2. What I am going to cover 1 Some background to tax credits 2 The error and fraud challenge in tax credits 3 Predictive analytics 4 Building a model for targeting tax credits error and fraud 5 Some findings

  3. 1 Some background to tax credits 2 The error and fraud challenge in tax credits 3 Predictive analytics 4 Building a model for targeting tax credits error and fraud 5 Some findings

  4. Tax Credits – the basics • Annual system. • Elements based on different circumstances: • Children; • Work (16/30 hour thresholds); • Childcare; • Disability. • And household income. • Responsive to changes in circumstances or income.

  5. Tax Credits – the more complicated bits • Initial award based on previous year’s income, and current circumstances. • ‘Provisional’ payments made for year: • Customers required to report changes in their circumstances or income through the year. • Customers then ‘finalise’ their award in the following year and HMRC works out what their entitlement was: • Any under or overpayment calculated and recovered.

  6. Timeline for the 2010-11 tax credit award year Award finalised Payments made Enquiry window August 2012 April 2011 August 2011 April 2010 Over/underpayment = difference between reported final entitlement and payments made. Error/fraud = difference between reported final entitlement and true entitlement.

  7. Tax Credits – some numbers • In 2009-10 for Child and Working tax credits HMRC: • Paid out £26.6 billion; • In 7.3 million awards; • At an average of £3,650 per award; • To an average of 6.3 million families. • £1.95 billion was paid out due to either error or fraud.

  8. 1 Some background to tax credits 2 The error and fraud challenge in tax credits 3 Predictive analytics 4 Building a model for targeting tax credits error and fraud 5 Some findings

  9. HMRC’s tax credit error and fraud challenge • In 2008 HMRC was set a DSO target, specifically: • “Reduce the level of tax credit error and fraud to no more than 5% by March 2011” • It was estimated that in 2010-11 HMRC would need to reduce the level of error and fraud in the system by £1.4 billion. • As part of this HMRC outlined details of a new compliance strategy for tax credits to take effect from 2009-10.

  10. 1 Some background to tax credits 2 The error and fraud challenge in tax credits 3 Predictive analytics 4 Building a model for targeting tax credits error and fraud 5 Some findings

  11. Can an analytical perspective contribute to achieving this target?

  12. What is Predictive Analytics? • Basically a set of tools that enable us to use historical data to predict future outcomes. • Create a consistent risk ranking to enable best allocation of resources: • Especially important during a period of declining resources. • Potential to increase compliance yields. • Number of different statistical methods available.

  13. Common predictive methods • Strengths: • Can cope with missing data; • Generates easy to understand rules; • Non-parametric and flexible. • Weaknesses: • Requires large sample sizes; • Can become over-complex; • Tendency to over-fit the data. Decision Trees

  14. Common predictive methods • Strengths: • Provides a unique risk score for each case; • Makes full use of the available data – smaller sample sizes. • Weaknesses: • Need to impute missing data; • Need to understand the underlying relationship; • Can be time consuming. Regression

  15. Common predictive methods • Strengths: • Good at predicting outcomes of complex problems; • Can cope with non-linearity in the data. • Weaknesses: • Needs careful data preparation and variable selection; • Takes longer to train model; • Can be difficult to understand the model generated. Neural Networks

  16. 1 Some background to tax credits 2 The error and fraud challenge in tax credits 3 Predictive analytics 4 Building a model for targeting tax credits error and fraud 5 Some findings

  17. What evidence can we base a predictive model on? Data collected from normal compliance interventions Data collected, via EFAP, to measure levels of error and fraud • Carried out annually; • Stratified random sample of population; • 3,000 – 5,000 awards per year; • Full compliance enquiry carried out; • Error recorded post-finalisation. • Carried out on an ongoing basis; • Selected by varying pre-defined criteria; • 100,000’s of awards per year; • Varying levels of intervention – from full enquiry to less formal; • Error recorded pre and post-finalisation. Both approaches are valid but we want to target the error and fraud that is not being picked up by the processes currently in place – so first option is best.

  18. How we chose a predictive method • Best method depends on the circumstances, a number of criteria were considered: • Data quality: • reasonable sized sample and representative of overall population; • large number of variables available; and, • some missing values. • Time available: • quite limited. • Prior knowledge: • some ideas around causes of error. • How this information is going to be used by the business: • to select cases for which specific interventions will be designed. Therefore a decision tree approach appeared the most appropriate.

  19. Building the model – data used • Used last 3 years worth of EFAP data covering 2007-08 to 2009-10: • Roughly 11,000 cases across the 3 years. • Matched in characteristics from other HMRC data sources: • e.g. Child Benefit and Self-assessment. • Also matched in external data sources: • e.g. ACORN.

  20. Building the model – groups targeted • Chose to model each risk category separately: • Children; Childcare; Work & Hours; Income; Undeclared Partner; and, Disability. • Each risk category covers slightly different populations at risk, in 2009-10: • Childcare risk existed in 7% of the tax credits population; • Children risk existed in 90% of the tax credits population. • In general only a small amount of cross-over between the risk categories.

  21. Risk categories – some numbers Size of risk categories, 2009-10

  22. Building the model – outcome targeted • Two outcomes that we could model for: • Average value of error/fraud for a case; • Likelihood that a case is non-compliant. • We want to maximise yield, so look at first of these: • More important to reduce the amount of error/fraud than the number of cases with error/fraud. • But, strike rate is still important: • Don’t want to unduly add to burdens on the compliant population.

  23. Building the model – models built • Want to model the latest situation where possible: • Characteristics associated with error/fraud may change over time. • Used combined 2007-08 to 2009/10 data: • Aim to find consistent groups across the three years. • Used 2009/10 data on its own: • Aim to find groups unique to 2009-10.

  24. 1 Some background to tax credits 2 The error and fraud challenge in tax credits 3 Predictive analytics 4 Building a model for targeting tax credits error and fraud 5 Some findings

  25. Children risk category – decision tree output

  26. Children risk category – final group population estimates Groups in red: Average error a lot higher than the overall population. Groups in blue: Average error a bit higher than the overall population. Groups in grey: Average error less than the overall population.

  27. Children risk category – improvement in targeting Improvement in risk targeting using this predictive approach

  28. Children risk category - the final product • Final product is a set of rules that determine our high risk groups to target. • For the Children risk category we have the rules relating to nodes 23 and 24: • Split 4 of Characteristics A; • Split 3 of Characteristic A and Split 1 of characteristic C. • Rules passed to compliance. • Compliance may later subset the group depending on how many cases they need for an intervention.

  29. Limitations of approach • Only modelling the error and fraud left in the tax credit system at the end of the year: • Supplementing rather than replacing current processes. • Data at least 18 months old when analysis is completed: • Areas of risk that we have highlighted may have already been targeted; • Behaviour of population may have changed. • Yield modelled relates to a full compliance enquiry: • Interventions applied may be less formal. • Applying findings from one point in the tax credit award year to another.

  30. Has this approach been effective in practice? Rate of Error and Fraud over time

  31. Has this approach been effective in practice? • No formal evaluation yet of the how much this approach has contributed to the reductions seen in error and fraud. • Have seen significant levels of yield from some of the interventions designed around these groups. • Further work will be needed to establish the quantitative impact of this approach.

  32. The End • Any • Questions?

More Related