200 likes | 530 Views
A Classical Apriori Algorithm for Mining Association Rules. What is an Association Rule?. Given a set of transactions { t 1 , t 2 , ...,t n } where a transaction t i is a set of items {X i1 , … , X im } An association rule is an expression: A ==> B
E N D
What is an Association Rule? • Given a set of transactions {t1, t2, ...,tn} • where a transaction ti is a set of items {Xi1, … , Xim} • An association rule is an expression: • A ==> B • where A & B are sets of items, and A B = • Meaning: transactions which contain A also contain B
Two Thresholds • Measurement of rule strength in a • relational transaction database • A ==> B [support, confidence] • support(AB)= • confidence (A ==> B)=
Strong Rules • We are interested in strong associations, i.e., • support min_sup • & confidence min_conf • Examples: • bread & butter ==> milk [support=5%, confidence=60%] • beer ==> diapers [support=10%, confidence=80%]
Mining Association Rules • Mining association rules from a large dataset of items • can improve the quality of business decisions • A supermarket with a large collection of items, • typical business decisions: • what to put on sale • how to design coupons, • how to place merchandise on shelves to • maximize the profit, etc.
Mining Association Rules (2) • There are two main steps in mining association rules • 1. Find all combinations of items that have transaction • support above minimum support (frequent itemsets) • 2. Generate association rules from the frequent itemsets • Most existing algorithms focused on the first step • because it requires a great deal of computation, memory, • and I/O, and has a significant impact on the overall • performance
The Classical Mining Algorithm Apriori (Agrawal, et al.’94) • At the first iteration, scan all the transactions and • count the the number of occurrences for each items. • This derives the frequent 1-itemsets, L1 • At the k-th iteration, the candidate set Ck are those • whose every (k-1)-item subset is in Lk-1is formed • Scan the database and count the number of • occurrences for each candidate k-itemset • Totally, it needs xdatabase scans for x levels
Moving 1 level at a time (Apriori) through an itemset lattice Level x … Level (k+1) Level k … Level 3 Level 2 Level l
The Algorithm Apriori 1. L1 = {frequent 1-itemset} 2. For (k=2; Lk-1 L < > 0, k++) { 3. Ck = Apriori_gen(Lk-1) ; 4. for all transactions t in D do 5. for all candidates c in D do 6. c.count++ ; 7. Lk = {c in Ck | c.count >= minimum support} 8. } 9. Result = UkLk
The Algorithm Apriori _gen Pre: all itemsets in Lk-1 Post: itemsets in Ck Insert into Ck Select p.item1, p.item2, …, p.itemk-1, q.itemk-1 From Lk-1 p, Lk-1 q Where p.item1 = q.item1, …, p.itemk-2 = q.itemk-2, p.itemk-1 = q.itemk-1
The prune step Pre: itemsets in Ckand Lk-1 Post: itemsets in Cksuch that some (k-1)-subset of c which is not in Lk-1are deleted Forall itemsets c Ckdo Forall (k-1)-subsets s of c do if (s Lk-1) then delete c from Ck
An Example Input Dataset Tid items 1 A B C 2 B C E 3 A B C E F 4 A B C D 5 A B C E 6 A B C E F 7 B C D E F 8 A B C 9 A C D E 10 B C E F minsup = 20% L1 = { A, B, C, D, E, F}
An Example (2) C2 = {AB, AC, AD, AE, AF, BC, BD, BE, BF, CD, CE, CF, DE, DF, EF} After counting C2 = {AB(6), AC(7), AD(2), AE(4), AF(2), BC(8), BD(2), BE(6), BF(4), CD(3), CE(6), CF(3), DE(2), DF(1), EF(4)} L2 = {AB, AC, AD, AE, AF, BC, BD, BE, BF, CD, CE, CF, DE, EF}
An Example (3) C3 = {ABC, ABD, ABE, ABF, ACD, ACE, ACF, ADE, ADF, AEF, BCD, BCE, BCF, BDE, BDF, BEF, CDE, CDF, CEF} After pruning C3 = {ABC, ABD, ABE, ABF, ACD, ACE, ACF, ADE, AEF, BCD, BCE, BCF, BDE, BEF, CDE, CEF} After counting C3 = {ABC(6), ABD(1), ABE(3), ABF(2), ACD(2), ACE(4), ACF(2), ADE(1), AEF(2), BCD(2), BCE(2), BCF(3), BDE(1), BEF(4), CDE(2), CEF(3)} L3 = {ABC, ABE, ABF, ACD, ACE, ACF, AEF, BCD, BCE, BCF, BEF, CDE, CEF}
An Example (4) C4 = {ABCE, ABCF, ABEF, ACDE, ACDF, BCDE, BCDF, BCEF} After pruning C4 = {ABCE, ABCF, ABEF, ACEF, BCEF} After counting C4 = {ABCE(3), ABCF(2), ABEF(2), ACEF(2), BCEF(3),} L4 = {ABCE, ABCF, ABEF, ACEF, BCEF}
An Example (5) C5 = {ABCEF} After counting C5 = {ABCEF(2)} L5 = {ABCEF}
Assignment 1 • Work: • ให้เขียนโปรแกรมที่สอดคล้องกับ An algorithm Apriori เพื่อ generate Frequent itemsets ในแต่ละ Level ของ Itemsets lattice • Data sets : • สามารถ download จากเครื่อง “angsila/~nuansri/310214” • run ด้วยค่า minimum support ต่างๆดังนี้ • xt10.data ==> minsup = 20%, 15%, และ 10% • tr2000.data ==> minsup= 10%, 8% และ 5%
Assignment 1 (2) • Due : • วันจันทร์ ที่ 15ก.ย. 2546 • สาธิตโปรแกรมและเอกสารประกอบโปรแกรม ณ ห้อง SD417 • Note: • Frequent itemsets ในทุก Level ของ Itemsets lattice จะต้องเหมือนกัน ไม่ว่า run โดยคนละโปรแกรม หรือโปรแกรมใช้โครงสร้างข้อมูลที่ต่างกัน ใน data sets ชุดเดียวกัน • ดังนั้นนิสิตทุกคน สามารถตรวจความถูกต้องของ จำนวนและค่าของ frequent itemsets ใน data sets ชุดเดียวกัน กับเพื่อนร่วมชั้นเรียน