270 likes | 458 Views
(Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi) . Association Rule Mining. An Example. Terminology. Transaction. Item. Itemset. Association Rules. Let U be a set of items and let X , Y U , with X Y =
E N D
(Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi) Association Rule Mining
Terminology Transaction Item Itemset
Association Rules Let U be a set of items and let X, YU, with XY = An association rule is an expression of the form XY, whose meaning is: If the elements of X occur in some context, then so do the elements of Y
Quality Measures Let T be set of all transactions. The following statistical quantities are relevant to association rule mining: support(X) |{t T: X t}| / |T| support(X Y) |{t T: XY t}| / |T| confidence(XY) |{t T: XY t}| / |{t T: X t}| The percentage of all transactions, containing item set x The percentage of all transactions, containing both item sets x and y The percentage of transactions containing item set x, that also contain item set y. How good is item set x at predicting item set y.
Learning Associations The purpose of association rule learning is to find “interesting” rules, i.e., rules that meet the following two user-defined conditions: support(XY) MinSupport confidence(XY) MinConfidence
Itemsets Frequent itemset An itemset whose support is greater than MinSupport (denoted Lk where k is the size of the itemset) Candidate itemset A potentially frequent itemset (denoted Ck where k is the size of the itemset) High percentage of transactions contain the full item set.
Basic Idea Generate all frequent itemsets satisfying the condition on minimum support Build all possible rules from these itemsets and check them against the condition on minimum confidence All the rules above the minimum confidence threshold are returned for further evaluation
AprioriAll (I) • L1 • For each item IjI • count({Ij}) = | {Ti : IjTi} | • If count({Ij}) MinSupport x m • L1L1 {({Ij}, count({Ij})} • k 2 • While Lk-1 • Lk • For each (l1, count(l1)) Lk-1 • For each (l2, count(l2)) Lk-1 • If (l1 = {j1, …, jk-2, x} l2 = {j1, …, jk-2, y} xy) • l {j1, …, jk-2, x, y} • count(l) | {Ti : lTi } | • If count(l) MinSupport x m LkLk {(l, count(l))} • kk + 1 • Return L1L2… Lk-1 The number of all transactions, containing item I_j If this count is big enough, we add the item and count to a stack, L_1
Rule Generation • Look at set {a,d,e} • Has six candidate association rules: • {a}{d,e} confidence: {a,d,e} / {a} = 0.571 • {d,e}{a} confidence: {a,d,e} / {d,e} = 1.000 • {d}{a,e} confidence: {a,d,e} / {d} = 0.667 • {a,e}{d} confidence: {a,d,e} / {a,e} = 0.667 • {e}{a,d} confidence: {a,d,e} / {e} = 0.571 • {a,d}{e} confidence: {a,d,e} / {a,d} = 0.800
Rule Generation • Look at set {a,d,e}. Let MinConfidence == 0.800 • Has six candidate association rules: • {d,e}{a} confidence: {a,d,e} / {d,e} = 1.000 • {a,e}{d} confidence: {a,d,e} / {a,e} = 0.667 • {a,d}{e} confidence: {a,d,e} / {a,d} = 0.800 • {d}{a,e} confidence: {a,d,e} / {d} = 0.667 • Selected Rules: • {d,e}a and {a,d}e
Summary Apriori is a rather simple algorithm that discovers useful and interesting patterns It is widely used It has been extended to create collaborative filtering algorithms to provide recommendations
References Fast Algorithms for Mining Association Rules (1994) Rakesh Agrawal, Ramakrishnan Srikant. Proc. 20th Int. Conf. Very Large Data Bases, VLDB (PDF) Mining Association Rules between Sets of Items in Large Databases (1993) Rakesh Agrawal, Tomasz Imielinski, Arun Swami. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data Introduction to Data Mining P-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Pearson Education Inc., 2006, Chapter 6