1 / 62

Association Mining

Association Mining. Data Mining Spring 2012. Transactional Database. Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk}. Items and Itemsets. Item = {Milk}, {Cheese}, {Bread}, etc. Itemset = {Milk}, {Milk, Cheese}, {Bacon, Bread, Milk}

Download Presentation

Association Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Association Mining Data Mining Spring 2012

  2. Transactional Database • Transactional Database • Transaction – A row in the database • i.e.: {Eggs, Cheese, Milk}

  3. Items and Itemsets • Item = {Milk}, {Cheese}, {Bread}, etc. • Itemset = {Milk}, {Milk, Cheese}, {Bacon, Bread, Milk} • Doesn’t have to be in the dataset • Can be of size 1 – n

  4. The Support Measure

  5. Support Examples Support({Eggs}) = 3/5 = 60% Support({Eggs, Milk}) = 2/5 = 40%

  6. Minimum Support Minsup– The minimum support threshold for an itemset to be considered frequent (User defined) Frequent itemset – an itemset in a database whose support is greater than or equal to minsup. Support(X) >minsup = frequent Support(X) < minsup = infrequent

  7. Minimum Support Examples Minimum support = 50% Support({Eggs}) = 3/5 = 60%  Pass Support({Eggs, Milk}) = 2/5 = 40%  Fail

  8. Association Rules

  9. Confidence Example 1 {Eggs} => {Bread} Confidence = sup({Eggs, Bread})/Sup({Eggs}) Confidence = (1/5)/(3/5) = 33%

  10. Confidence Example 2 {Milk} => {Eggs, Cheese} Confidence = sup({Milk, Eggs, Cheese})/sup({Milk}) Confidence = (2/5)/(3/5) = 66%

  11. Strong Association Rules Minimum Confidence– A user defined minimum bound on confidence. (Minconf) Strong association rule – a rule X=>Y whose conf>minconf. - this is a potentially interesting rule for the user. Conf(X=>Y) >minconf = strong Conf(X=>Y) < minconf = uninteresting

  12. Minimum Confidence Example Minconf = 50% {Eggs} => {Bread} Confidence = (1/5)/(3/5) = 33% Fail {Milk} => {Eggs, Cheese} Confidence = (2/5)/(3/5) = 66%  Pass

  13. Association Mining Association Mining: - Finds strong rules contained in a dataset from frequent itemsets. Can be divided into two major subtasks: 1. Finding frequent itemsets 2. Rule generation

  14. Transactional Database Revisited • Some algorithms change items into letters or numbers • Numbers are more compact • Easier to make comparisons

  15. Basic Set Logic Subset – a subset itemset X is contained in an itemset Y. Superset – a superset itemset Y contains an itemset X. example: X = {1,2} Y = {1,2,3,5} Y X

  16. Apriori • Arranges database into a temporary lattice structure to find associations • Apriori principle – 1. itemsets in the lattice with support < minsupwill only produce supersets with support < minsup. 2. the subsets of frequent itemsets are always frequent. • Prunes lattice structure of non-frequent itemsets using minsup. • Reduces the number of comparisons • Reduces the number of candidate itemsets

  17. Monotonicity Monotone (upward closed) - if X is a subset of Y, then support(X) cannot exceed support(Y). Anti-Monotone (downward closed) - if X is a subset of Y, then support(Y) cannot exceed support(X). Apriori is anti-monotone. - uses this property to prune the lattice structure.

  18. Itemset Lattice

  19. Lattice Pruning

  20. Lattice Example Count occurrences of each 1-itemset in the database and compute their support: Support = #occurrences/#rows in db Prune anything less than minsup = 30%

  21. Lattice Example Count occurrences of each 2-itemset in the database and compute their support Prune anything less than minsup = 30%

  22. Lattice Example Count occurrences of the last 3-itemset in the database and compute its support. Prune anything less than minsup = 30%

  23. Example - Results Frequent itemsets: {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3}

  24. Apriori Algorithm

  25. Frequent Itemset Generation • Minsup = 70% • Generate all 1-itemsets • Calculate the support for each itemset • Determine whether or not the itemsets are frequent

  26. Frequent Itemset Generation Generate all 2-itemsets, minsup = 70% {1} U {3} = {1,3} , {1} U {5} = {1,5} {3} U {5} = {3,5}

  27. Frequent Itemset Generation Generate all 3-itemsets, minsup = 70% {1,3} U {1,5} = {1,3,5}

  28. Frequent Itemset Results All frequent itemsets generated are output: {1} , {3} , {5} {1,3} , {1,5} , {3,5} {1,3,5}

  29. Apriori Rule Mining

  30. Apriori Rule Mining Rule Combinations: 1. {1,2} 2-itemsets {1}=>{2} {2}=>{1} 2. {1,2,3} 3-itemsets {1}=>{2,3} {2,3}=>{1} {1,2}=>{3} {3}=>{1,2} {1,3}=>{2} {2}=>{1,3}

  31. Strong Rule Generation • I = {{1}, {3}, {5}} • Rules = X => Y • Minconf = 80%

  32. Strong Rule Generation • I = {{1}, {3}, {5}} • Rules = X => Y • Minconf = 80%

  33. Strong Rules Results All strong rules generated are output: {1}=>{5} {3}=>{5} {2}=>{3,5} {2,3}=>{5} {2,5}=>{3}

  34. Other Frequent Itemsets Closed Frequent Itemset – a frequent itemset X who has no immediate supersets with the same support count as X. Maximal Frequent Itemset – a frequent itemset whom none of its immediate supersets are frequent.

  35. Itemset Relationships Frequent Itemsets Closed Frequent Itemsets Maximal Frequent Itemsets

  36. Targeted Association Mining

  37. Targeted Association Mining * Users may only be interested in specific results * Potential to get smaller, faster, and more focused results * Examples: 1. User wants to know how often only bread and garlic cloves occur together. 2. User wants to know what items occur with toilet paper.

  38. Itemset Trees * Itemset Tree: - A data structure which aids in users querying for a specific itemset and it’s support. * Items within a transaction are mapped to integer values and ordered such that each transaction is in lexical order. {Bread, Onion, Garlic} = {1, 2, 3} * Why use numbers? - make the tree more compact - numbers follow ordering easily

  39. Itemset Trees An Itemset Tree T contains: * A root pair (I, f(I)), where I is an itemset and f(I) is its count. * A (possibly empty) set {T1, T2, . . . , Tk} each element of which is an itemset tree. * If Ij is in the root, then it will also be in The root’s children * If Ij is not in the root, then it might be in the root’s children if: first_item(I) < first_item(Ij) and last_item(I) < last_item(Ij)

  40. Building an Itemset Tree Let ci be a node in the itemset tree. Let I be a transaction from the dataset Loop: Case 1: ci = I Case 2: ci is a child of I - make I the parent node of ci Case 3: ci and I contain a common lexical overlap i.e. {1,2,4} vs. {1,2,6} - make a node for the overlap - make I and ci it’s children. Case 4: ci is a parent of I - Loop to check ci’s children - make I a child of ci Note: {2,6} and {1,2,6} do not have a Lexical overlap

  41. Itemset Trees - Creation

  42. Itemset Trees - Creation Child node.

  43. Itemset Trees - Creation Child node.

  44. Itemset Trees - Creation Child node.

  45. Itemset Trees - Creation Lexical overlap

  46. Itemset Trees - Creation Parent node.

  47. Itemset Trees - Creation Child node.

  48. Itemset Trees – Querying Let I be an itemset, Let cibe a node in the tree Let totalSup be the total count for I in the tree For all s.t. first_item(ci) < first_item(I): Case 1: If I is contained in ci. - Add support to totalSup. Case 2: If I is not contained and last_item(ci) < last_item(I) - proceed down the tree

  49. Example 1

  50. Itemset Trees - Querying • Querying Example 1: • Query: {2} • totalSup = 0

More Related