490 likes | 500 Views
This lecture provides an introduction to association rule mining, including the basics, motivation, and real-life examples. It explains the Apriori algorithm and FP-Growth algorithm for mining frequent itemsets and generating association rules.
E N D
CSC-480 Data Mining Lecture 03 – Association Rule Mining Muhammad Tariq Siddique https://sites.google.com/site/mtsiddiquecs/dm
Gentle Reminder “Switch Off” your Mobile Phone Or Switch Mobile Phone to “Silent Mode”
The Basics Which items are frequently purchased together by customers
The Basics • Motivation: Business transaction records • Discovery of interesting correlation relationships that help business decision-making processes (catalog design, cross-marketing, customer shopping behavior analysis, …)
The Basics • How to place SW, HW, and Accessories?
The Basics Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
The Basics - Frequent Itemsets Transaction Dataset Itemset Occurrence Frequency Frequent Itemset
The Basics - Association Rules # • If frequency of itemsetI satisfies min_support count then I is a frequent itemset • If a rule satisfies min_supportand min_confidence thresholds, it is said to be strong • problem of mining association rules reduced to mining frequent itemsets • Association rules mining becomes a two-step process: • Find all frequent itemsetswith frequently ≥ a predetermined min_support count • Generate strong association rules from the frequent itemsets that satisfy min_supportand min_confidence % % Most costly Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Agenda Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsApriori Algorithm • Finds frequent itemsets by exploiting prior knowledge of frequent itemsetproperties • level-wise search, where k-itemsets are used to explore k +1-itemsets • Goes as follows: • Find frequent 1-itemsets L1 • Use L1 to find frequent 2-itemsets L2 • … until no more frequent k-itemsets can be found • Each Lkitemset requires a full dataset scan • To improve efficiency, use the Apriori property: • “All nonempty subsets of a frequent itemset must also be frequent” – if a set cannot pass a test, all of its supersets will fail the same test as well – if P(I) < min_support then P(I A) < min_support
Mining Frequent ItemsetsApriori Algorithm Scan dataset for count of each candidate Compare candidate support with min_supp Transactional data example N=9, min_supp count=2 C1 L1 Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsApriori Algorithm Compare candidate support with min_supp C2 C2 L2 Scan dataset for count of each candidate Generate C2 candidates from L1 by joining L1 L1
Mining Frequent ItemsetsApriori Algorithm C3= L2 L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}} Scan dataset for count of each candidate Compare candidate support with min_supp Not all subsets are frequent Prune(Apriori property) C3 L3 Two joining (lexicographically ordered) k-itemsets must share first k-1 items {I1, I2} is not joined with {I2, I4} Generate C3 candidates from L2by joiningL2 L2 Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsApriori Algorithm Not all subsets are frequent Prune C4 = Terminate
The AprioriAlgorithm—Exercise Database TDB Supmin = 2
The AprioriAlgorithm—Exercise (Solution) Supmin = 2 Database TDB L1 C1 1st scan C2 C2 L2 2nd scan L3 C3 3rd scan C4 = Terminate
Mining Frequent ItemsetsApriori Algorithm Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Apriori Algorithm Generate Ck using Lk-1 to find Lk Join Prune
Mining Frequent ItemsetsGenerating Association Rules from Frequent Itemsets Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsGenerating Association Rules from Frequent Itemsets For a min_confidence= 70% Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsFP-Growth • To avoid costly candidate generation • Divide-and-conquer strategy: • Compressdatabase representing frequent items into a frequent pattern tree (FP-tree) – 2 passes over dataset • Divide compressed database (FP-tree) into conditional databases, then mine each for frequent itemsets – traverse through the FP-tree Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsFP-Growth Scan dataset for count of each candidate Compare candidate support with min_supp Transactional data example N=9, min_supp count=2 L1 - Reordered C1 Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsFP-Growth – FP-tree Construction FP-tree null { } L1 - Reordered Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsFP-Growth – FP-tree Construction FP-tree null { } I2:1 I1:1 I5:1 T100 L1 - Reordered Order of items is kept throughout path construction, with common prefixes shared whenever applicable
Mining Frequent ItemsetsFP-Growth – FP-tree Construction FP-tree null { } I2:1 I4:1 I1:1 I5:1 L1 - Reordered T200 Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsFP-Growth – FP-tree Construction FP-tree null { } I2:2 I4:1 I1:1 I5:1 L1 - Reordered T200 Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsFP-Growth – FP-tree Construction FP-tree null { } I2:2 I4:1 I1:1 I5:1 I3:1 L1 - Reordered T300 Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsFP-Growth – FP-tree Construction FP-tree null { } I2:3 I4:1 I1:1 I5:1 I3:1 L1 - Reordered T300 Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsFP-Growth – FP-tree Construction FP-tree Trace the node link path for each node entry and you get that item’s support count null { } I1:2 I2:7 I3:2 I4:1 I3:2 I1:4 I4:1 I3:2 I5:1 I5:1 L1 - Reordered For Tree Traversal Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsFP-Growth – Frequent Patterns Mining FP-tree Bottom-up algorithm – start from leaves and go up to root – I5 for example has two paths to root I5:1 I5:1 I3:2 I4:1 I1:4 I3:2 I3:2 I2:7 I1:2 null { } I4:1 L1 - Reordered {I3, I5} frequency < min_supportcount threshold Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsFP-Growth – Conditional FP-tree Construction FP-tree For I5 null { } L1 - Reordered Eliminate transactions not including I5 Eliminate I5 Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsFP-Growth – Conditional FP-tree Construction FP-tree For I5 null { } I2:1 I1:1 L1 - Reordered Eliminate transactions not including I5 Eliminate I5 Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsFP-Growth – Conditional FP-tree Construction FP-tree For I5 null { } I2:2 I1:2 I3:1 L1 - Reordered Eliminate transactions not including I5 Eliminate I5 Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsFP-Growth Paths to which item is suffix Prefix paths to item after eliminating infrequent items Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsFP-Growth – Conditional FP-tree Construction FP-tree For I4 null { } I2:2 I1:1 L1 - Reordered Eliminate transactions not including I4 Eliminate I4 Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsFP-Growth – Conditional FP-tree Construction FP-tree For I3 null { } I1:2 I2:4 I1:2 L1 - Reordered Eliminate transactions not including I3 Eliminate I3 Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Mining Frequent ItemsetsFP-Growth Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Construct FP-Tree Exercise TID Items bought 100 {f, a, c, d, g, i, m, p} 200 {a, b, c, f, l, m, o} 300 {b, f, h, j, o, w} 400 {b, c, k, s, p} 500{a, f, c, e, l, p, m, n} min_support = 3
{} Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 f:4 c:1 c:3 b:1 b:1 a:3 p:1 m:2 b:1 p:2 m:1 Construct FP-Tree Exercise (Solution) TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p}{f, c, a, m, p} 200 {a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o, w}{f, b} 400 {b, c, k, s, p}{c, b, p} 500{a, f, c, e, l, p, m, n}{f, c, a, m, p} min_support = 3 • Scan DB once, find frequent 1-itemset (single item pattern) • Sort frequent items in frequency descending order, f-list • Scan DB again, construct FP-tree F-list = f-c-a-b-m-p
Agenda Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Pattern Evaluation Methods • Not all association rules are interesting • buys(X, “computer games”) buys(X, “videos”) [40%, 66%] • P(“videos”) is already 75% > 66% • The two items are negatively associated buying one decreases the likelihood of buying the other • We need to measure “real strength” of rule • Correlation analysis Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
Pattern Evaluation Methods • A and B are independent if = • Otherwise, dependent and correlated occurrence • If < 1, A is negatively correlated with B • If > 1, A is positively correlated with B A’s occurrence “lifts” the occurrence of B • χ2 already discussed in a previous lecture Data Mining 2013 – Mining Frequent Patterns, Association, and Correlations
References • Jiawei HanandMichelineKamber, Data Mining:Concepts and TechniquesThird Edition, Elsevier, 2012 • Ian H. Witten, Frank Eibe, Mark A. Hall, Data mining: PracticalMachineLearning Toolsand Techniques3rd Edition, Elsevier, 2011 • Markus Hofmann and Ralf Klinkenberg, RapidMiner: Data Mining Use Cases and Business Analytics Applications, CRC Press Taylor & Francis Group, 2014 • Daniel T. Larose, Discovering Knowledgein Data: an Introductionto DataMining, John Wiley & Sons, 2005 • EthemAlpaydin, Introduction to Machine Learning, 3rd ed., MIT Press, 2014 • Florin Gorunescu, Data Mining: Concepts, Models and Techniques, Springer, 2011 • OdedMaimonandLiorRokach, Data Mining and Knowledge Discovery Handbook Second Edition, Springer, 2010 • Warren Liao and EvangelosTriantaphyllou (eds.), Recent Advances in Data Mining of Enterprise Data: Algorithmsand Applications, World Scientific, 2007