Action Rules Discovery /Lecture I/

Action Rules Discovery/Lecture I/ by Zbigniew W. Ras UNC-Charlotte, USA

Interestingness measure E = [Cond1 => Cond2] Presumptive Objective Rule: two conditions occur together, with some confidence Data Mining Task: For a given dataset D, interestingness measure ID and threshold c, find association E such that ID(E) > c. Knowledge Engineer definesc

Interestingness Function Two types of Interestingness Measure[Silberschatz and Tuzhilin, 1995]: subjective and objective. Subjective measure: user-driven, domain-dependent. Include unexpectedness [Silberschatz and Tuzhilin, 1995], novelty, actionability [Piatesky-Shapiro & Matheus, 1994]. Objective measure: data-driven and domain-independent. They evaluate rules based on statistics and structures of patterns, e.g., support, confidence, etc.

Objective Interestingness Basic Measures for : Domain: card[] Support or Strength: card[  ] Confidence or Certainty Factor: card[]/card[] Coverage Factor: card[]/card[] Leverage: card[]/n – [card[]/n]*[card[]/n] Lift: n  card[]/[card[]*card[]]

Subjective Interestingness • Rule is interesting if it is: • unexpected, if it contradicts the user belief about the domain and therefore surprises the user • novel, if to some extent contributes to new knowledge • actionable, if the user can take an action to his/her advantage based on this rule Unexpectedness [Suzuki, 1997] /does not depend on domain knowledge/ If r = [AB1] has a high confidence and r1 = [A*CB2] has a high confidence, then r1 is unexpected. [Padmanabhan & Tuzhilin] A  B is unexpected with respect to the belief on the dataset D if the following conditions hold: B   = False [ B and  logically contradict each other] A   holds on a large subset of D  A*  B holds which means A*  

Actionable rules • Action rules: suggest a way to re-classify objects (for instance customers) to a desired state. • Action rules can be constructed from classification rules. • To discover action rules it is required that the set of conditions (attributes) is partitioned into stable and flexible. • For example, date of birth is a stable attribute, and interest rate on any customer account is a flexible attribute (dependable on bank). The notion of action rules was proposed by [Ras & Wieczorkowska, PKDD’00]. Slowinski at al [JETAI, 2004] introduced similar notion called intervention.

Action Rules Decision table Any information system of the form S = (U, AFl ASt {d}), where • d  AFl ASt is a distinguished attribute called decision. • ASt - stable attributes, AFl {d} - flexible Action rule [Ras & Wieczorkowska]: [t(ASt)  (b1, v1 w1)  (b2, v2 w2)  …  (bp, vp wp)](x)  [(d, k1 k2)](x), where (i)[(1 i  p)  (biAFl)] E-Action rule [Ras & Tsay]: [t(ASt)  (b1,  w1)  (b2, v2 w2)  …  (bp, wp)](x)  [(d, k1 k2)](x), where (i)[(1 i  p)  (biAFl)]

Action Rules Discovery (Tsay & Ras) Stable Attribute: {a, c} Flexible Attribute: b Decision Attribute: d a = ? a = 0 Table: Set of rules R with supporting objects c = ? c = ? c = 1 a = 2 c = 0 a = ? T6 T4 T5 c = ? c = 2 Figure of (d, L)-tree T2 T3 (T3, T1) : (a = 2)  (b, 21) ( d, L  H) (a = 2)  (b, 31) ( d, L  H) T1 T2 Figure of (d, H)-tree T1

Application domain: Customer Attrition Facts: • On average, most US corporations lose half of their customers • every five years (Rombel, 2001). • Longer a customer stays with the organization, the more • profitable he or she becomes (Pauline, 2000; Hanseman, 2004). • The cost of attracting new customers is five to ten times • more than retaining existing ones. • About 14% to 17% of the accounts are closed for reasons • that can be controlled like price or service (Lunt, 1993). • Action: • Reducing the outflow of the customers by 5% can double • a typical company’s profit (Rombel, 2001).

Action Rules Discovery Decision table S = (U, AFl ASt {d}). Assumption: {a1,a2,...,ap}  ASt, {b1,b2,...,bq}  AFl, ai,1 Dom(ai), bi,1 Dom(bi). Rule: r = [a1,1 a2,1 ...  ap,1 ]  [b1,1 b2,1 ...  bq,1]  d1 stable part flexible part Question: Do we have to consider pairs of classification rules in order to construct action rules?

Action Rules Discovery Decision table S = (U, AFl ASt {d}). Assumption: {a1,a2,...,ap}  ASt, {b1,b2,...,bq}  AFl, ai,1 Dom(ai), bi,1 Dom(bi). Rule: r = [a1,1 a2,1 ...  ap,1 ]  [b1,1 b2,1 ...  bq,1]  d1 stable part flexible part Action rule r[d2  d1] associated with r and re-classification task (d, d2 d1): [a1,1 a2,1 ...  ap,1]  [(b1,  b1,1 ) (b2, b2,1) ...  (bq, bq,1)]  (d, d2 d1)

Action Rules Discovery Action rule r[d2  d1]: [a1,1 a2,1 ...  ap,1]  [(b1,  b1,1 ) (b2, b2,1) ...  (bq, bq,1)]  (d, d2 d1) Support Sup(r[d2  d1]) = {x  U: (a1(x)=a1,1)  (a2(x)=a2,1)...(ap(x)=ap,1)  (d(x)=d2)}. /d2-objects which potentially can be reclassified by r[d2  d1] to d1/ Sup(R[d2  d1]) = {Sup(r[d2  d1]): r  R}, where R- classification rules extracted from S. /d2-objects which potentially can be reclassified by r[d2  d1] to d1/

Action Rules Discovery Action rule r[d2  d1]: [a1,1 a2,1 ...  ap,1]  [(b1, b’1,1 b1,1 ) (b2, b’2,1b2,1) ...  (bq, bq,1)]  (d, d2 d1) Support Sup(r[d2  d1]) = {x  U: (b1(x)=b’1,1)  (b2(x)=b’2,1)  (a1(x)=a1,1)  (a2(x)=a2,1) ... (ap(x)=ap,1)  (d(x)=d2)}. /d2-objects which potentially can be reclassified by r[d2  d1] to d1/

Action Rules Discovery Let Ud2 = {x  U: d(x)=d2}. Then Bd2  d1 = Ud2 - Sup(R[d2  d1]) is a set of d2-objects in S which are d1-resistant. Let Sup(R[  d1]) = {Sup(R[d2  d1]) : d2 d1}. Then B d1 = U - Sup(R[  d1]) is a set of objects in S which are d1-resistant (can not be re-classified to class d1).

Action Rules Discovery Action rules r[d2  d1], r‘[d2  d3] are p-equivalent (), if r/bi = r'/bi always holds when r/bi, r'/bi are both defined, for every bi ASt AFl. Let x  Sup(r[d2  d1]). We say that x positively supports r[d2  d1] if there is no action rule r‘[d2  d3] extracted from S, d3 d1, which is p-equivalent to r[d2  d1] and x  Sup( r‘[d2  d3]).

Action Rules Discovery Let Sup+(R[d2  d1]) = {x  Sup(r[d2  d1]): x positively supports r[d2  d1]}. Confidence Conf(r[d2  d1]) = {card[Sup+(r[d2  d1])]/card[Sup(r[d2  d1])]}  Conf(r). Conf(r[  d1]) = {card[Sup+(r[  d1])]/card[Sup(r[  d1])]}  Conf(r).

Cost of Action Rule [Tzacheva & Ras] Assumption: S= (X, A, V) is information system, Y  X. Attribute b  A is flexible in S and b1, b2 Vb. By S(Y, b1, b2) we mean a number from (0, +] which describes the average predicted cost of approved action associated with a possible re-classification of qualifying objects in Y from class b1 to b2. Object x  Y qualifies for re-classification from b1 to b2, if b(x) = b1. S(Y, b1, b2) = +, if there is no action approved which is required for a possible re-classification of qualifying objects in Y from class b1 to b2 If Y is uniquely defined, we often write S(b1, b2)instead of S(Y, b1, b2).

Cost of Action Rule Action rule r: [(b1, v1→ w1)  (b2, v2→ w2)  … ( bp, vp→ wp)](x)  (d, k1→ k2)(x) The cost of r in S: costS(r) = {S(vi , wi) : 1  i  p} Action rule r is feasible in S, if costS(r) <S(k1, k2). For any feasible action rule r, the cost of the conditional part of r is lower than the cost of its decision part.

Cost of Action Rule Assumption: Cost of r is too high! r = [(b1, v1 → w1) … (bj, vj → wj) …  ( bp, vp → wp)](x)  (d, k1 → k2)(x) r1= [(bj1, vj1 → wj1)  (bj2, vj2 → wj2)  … ( bjq, vjq → wjq)](x)  (bj, vj → wj)(x) Then, we can compose r with r1 and the same replace term (bj, vj → wj) by term from the left hand side of r1: [(b1, v1 → w1)  … [(bj1, vj1 → wj1)  (bj2, vj2 → wj2)  …  ( bjq, vjq → wjq)] … ( bp, vp → wp)](x)  (d, k1 → k2)(x)

Class movability-index FS - decision attribute ranking – positive integer associated with a decision value /objects of higher decision attribute ranking are seen as objects more preferably movable between decision classes than objects of lower rank/. Nj+ = {i  N: FS(dj) – FS(di)  0}. Class movability-index assigned to Nj, ind(Nj) = {FS(dj)– FS(di): iNj+}

Class movability-index Let Pj(i) = Sup+(r[dj di]) /Pj(i) – all objects in U which can be reclassified from the decision class dj to the decision class di Pj(N) = {Pj(i): i  N, ij}, for any N {1,2,…,k} where {d1,d2,…,dk} are all decision classes. Class movability-index (m-index) assigned to dj-object x: indS(x) = max{ind(Nj): Nj{1,2,…,k}  x Pj(N)}

Questions? Thank You

Action Rules Discovery /Lecture I/