160 likes | 182 Views
Local Patterns in Plastic Card Fraud Detection. Niall Adams Department of Mathematics Imperial College London n.adams@imperial.ac.uk. Local Patterns and Parallel Universes. Dagstuhl, May 2007. Joint work with David Hand, Richard Bolton, Dave Weston, Piotr Juszczak Overview: Local patterns
E N D
Local Patterns in Plastic Card Fraud Detection Niall Adams Department of Mathematics Imperial College London n.adams@imperial.ac.uk Local Patterns and Parallel Universes. Dagstuhl, May 2007
Joint work with David Hand, Richard Bolton, Dave Weston, Piotr Juszczak Overview: • Local patterns • Fraud detection • Local pattern ideas in fraud detection • Occasional links with parallel universes (a new concept for me!)
We think of a local pattern as an anomalous configuration of data points (the pattern) with limited extent in the data space (local, cf. local and general anaesthetic). A pattern is often a small number of points compared to the data size. Require • A definition of anomaly • Anomalous compared to an assumed background model • A definition of local • Distance or similarity metrics appropriate or constructed for the data
In this framework, applications often generate too many patterns – many local anomalies. The next stage is typically to identify important (in some sense) patterns. One approach is to attempt to determine the magnitude of the anomaly using statistical reasoning – to assess the significance of the pattern. This encompasses many possibilities: parametric procedures; Bayesian approaches. Note that certain resampling approaches (eg. bootstrap) are likely to disrupt patterns, and therefore may be inappropriate as usually deployed.
Plastic Card Transaction Fraud Detection Plastic cards provide many useful services. However, prone to fraudulent abuse (attack?). Very big problem, eg. UK, 2006: 142 million cards in issue, total card fraud losses £428 million. Fraud tactics change: since the introduction of chip and PIN authentication in UK, profile of modes of fraud attack has changed: current popular attack is “card holder not present” -> primarily via internet. Objective to detect fraudulent transactions as rapidly as possible.
For fun: one example of a modified ATM Moral: check the machine, shield your Pin!
We have large datasets from a number of UK banks. A data set consists of a sequence of transactions for each account. Transactions records are complicated, up to 77 fields including • time and date • transaction outlet (ATM, POS) • transaction type • amount • many card reader and card response codes
The fraud transaction detection problem is challenging for many reasons, including • large data sets • imbalanced classes (fraud: 0.1%) • complicated data structure; irregularly spaced time series (alignment) • adaptive behaviour of fraudsters • need for timely detection • importance of avoiding false positives
Perhaps the most natural initial response to the transaction fraud detection problem is that it is a two-class classification problem (legitimate versus fraud). In this case, extensive processing of the whole data set is required for feature selection and data alignment. Moreover, this approaches can only capture fraud tactics embodied in the training data.
Instead, we have considered local pattern approaches. First, consider that each account is different; one aspect of ‘local’ is to consider that each account is a separate entity, with its own characteristic legitimate behaviour. Examination of data suggests that in some cases, one account’s fraud behaviour appears the same as another’s legitimate behaviour. We consider two local pattern approaches: • outlier detection • peer group analysis
Outlier Detection Treat each account separately. For (hopefully) useful features, construct a model of legitimate behaviour based on sequence of transactions, assumed to be legitimate. We map transactions to continuous features, then model estimates density of legitimate transactions. New transactions are regarded as anomalous with respect to this model. Various approaches for model construction: nearest neighour, kernel density estimate, SVM approach.
Example Flag anomaly by comparing density of new transaction with threshold. Set threshold be reasoning about the size of population, and the number of fraud enquiries financially possible.
This approach, provided the density estimator is well constructed, appears to perform at least as well as supervised approaches without using any information about fraud. (Note also the importance of the performance assessment measure.) Moreover, this approach has the potential to • operate asynchronously and continuously • detect new modes of fraud Flag anomaly by comparing density of new transaction with threshold. Set threshold be reasoning about the size of population, and the number of fraud enquiries financially possible.
Peer Group Analysis Based on the intuition that groups of accounts (peer groups) will exhibit characteristically similar behaviour. Features are derived from account transaction sequences. If we can define, or construct, such peer groups, we can regard departures from the group behaviour as anomalies, and proceed as usual. We have constructed peer groups on the basis of similarity, but other ideas possible (eg. prediction).
Ignoring the details of construction, we have found that there is merit in considering peer groups significantly smaller than the whole population The notable feature is that the greatest improvement over treating the whole population as a peer group is associated with small (ish) peer groups.
Future work on fraud Learning alert thresholds from the population; staggered learning - PGA as precursor to outlier detection Patterns? Fraud is one application. Others keep appearing. More likely that threshold selection will be key than background model building, and anomaly detection Extension to harder domains: audio, video…