120 likes | 232 Views
Re-ranking Subgroups for Fraud Detection. Stefan Rüping Fraunhofer IAIS Dagstuhl Seminar „Parallel Universes and Local Patterns“. Overview. Fraud Detection A prototypical application case of local patterns? Subgroup Discovery A prototypical algorithm for local patterns
E N D
Re-ranking Subgroups for Fraud Detection Stefan Rüping Fraunhofer IAIS Dagstuhl Seminar „Parallel Universes and Local Patterns“
Overview • Fraud Detection • A prototypical application case of local patterns? • Subgroup Discovery • A prototypical algorithm for local patterns • Re-ranking Subgroups • Making it work in practice… • Summary
Application Case – Project iWebCare • Developement of a e-government Web Service Plattform for Fraud Detection in Healthcare • Challenges: • Identification of novel fraud patterns • Monitoring of fraud patterns • Autonomous Mining, i.e. user works without the assistance of a data miner
Fraud Detection • Assumption: there exist typical ways of committing fraud, which make up small, but significant fraud patterns • Problem setting • In theory, supervised learning problem: map cases to fraud label • In practice, fraud labels impossible to collect • Alternative approaches • Analyze proxy label • Money spent, prescriptions issued, … • Find interesting patterns in the data • Interestingness is subjective to domain expert prototypical application case for local patterns
Subgroup Analysis • Task • Given examples (xi,yi)i=1…n X{0,1}, and kN • Find the k subgroups of X with highest statistical deviation in the probability of y • Subgroup S described by propositional formula • x(1) = A & x(2) = B P(y=1) = 0.9 • Quality measure: q(S) = ga |p-p0| where • g = #{ i | xi S } / n, and • p = #{ i | xi S, yi = 1 } / #{ i | xi S } • Algorithm: Explora (Kloesgen, 1996) • Full depth-First-Search with effective pruning • Several heuristic / randomized algorithms • Extension to numeric y possible
Subgroup Analysis II • Easily extensible to weighted examples: Given examples (xi,yi)i=1…n X{0,1}, and weights wi 0 • Let q(S) = ga |p-p0| where • g = iSwi , and • p = iS yi wi / g • Obviously identical to standard case when wi = 1/n
Problem Setting • Starting point • Fraud label does not exist • Domain expert can name an attribute whose distribution is somehow related to fraud • Subgroups which induced by this attribute are not necessarily to most interesting ones • Expert can hardly define what is interesting to him • Expert can easily give pairwise comparisons of subgroups: more-interesting-than • Assumption: the interestingness of a subgroup to the expert is defined by • The form of the subgroup, i.e. the parameter a • The attributes used to define the subgroup • The examples covered by the subgroup
Re-Ranking of Subgroups • Approach: Given • A list of subgroups (Si)i=1…n • Expert’s comparisons of subgroups from this list, i.e. set P of pairs (i,j) meaning that subgroup i is more interesting than subgroup j • Find a subgroup quality measure q’ which better correlates to the experts assessment of interestingness • Select the most interesting and the k most irrelevant subgroups from P • Represent subgroup S as (g, |p-p0|, attD, attS) where • p, g defined as usual • attD binary vector of size dim(X) with attD(i) = 1 attribute i used in definition of subgroup S • attS vector of size of intersection of S with most interesting / irrelevant subgroups
Algorithm • Reminder: S represented by (g, |p-p0|, attD, attS) • Assume q’(S) = ga |p-p0| j exp(attD(j)w(j)) i exp(attS(i)w(d+j)) • Usual quality function plus additional factor for attributes and covered examples • Such that log q’(S) = (a, 1, wd, ws) * (log g, log |p-p0|, attD, attS) Can use modified version of ranking SVM to find (a, wd, ws) that maximize correlation of q’ with interestingness information given by user
Ranking SVM • Standard Variant: • Subgroup-ranking Variant:
Iterative Algorithm • Find subgroups w.r.t. proxy attribute • Ask user for interestingness information, encoded as pairs P • Use ranking SVM to find (a, wd, ws) to maximize correlation of q’ with P • Re-start subgroup search with new a and weights wS • Preliminary results: significant increase in correlation between ranked subgroups and interestingness (measured in ranking w.r.t. unknown, true label)
Summary • Fraud detection is a typical application case for local pattern detection • Statistical validity of patterns only takes you so far… • Optimization of degree of interest directly targets local patterns to user requirements