1 / 24

Scalable Algorithms for Association Mining

Scalable Algorithms for Association Mining. Mohammed J. Zaki IEEE Transactions on Knowledge and Data Engineering , 2000. 報告人 : 吳建良. Abstract. Frequent itemset Vertical tid-list database format Lattice-theoretic approach Prefix-based and Maximal-clique-based partition

ramseyl
Download Presentation

Scalable Algorithms for Association Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Algorithms for Association Mining Mohammed J. Zaki IEEE Transactions on Knowledge and Data Engineering, 2000 報告人:吳建良

  2. Abstract • Frequent itemset • Vertical tid-list database format • Lattice-theoretic approach • Prefix-based and Maximal-clique-based partition • Pattern search strategy • Bottom-up, top-down and hybrid search • Require a few databases scan

  3. Symbol Definition

  4. Example

  5. Itemset Enumeration: Lattice-theoretic approach • Partial order • Reflexive, Antisymmetric, Transitive • Partial ordered set poset • Lattice • Poset • Any two element have unique join and meet • join= =least upper bound(a, b) • meet= = greatest lower bound(a, b) • Atom • Immediately succeed least element

  6. Power set lattice P(I) Gray circle: frequent itemset Black circle: maximal frequent itemset

  7. Lemma • Lemma1: • All subsets of a frequent itemset are frequent • All supersets of an infrequent itemset are infrequent • Lemma2: • The maximal frequent itemsets uniquely determine all frequent itemsets

  8. Support Counting • L(X): each database item X its tid-list • Support of k-itemset • Intersect the tid-list of any two of its (k-1)- itemset • Example • L(CD)=L(C) ∩ L(D) • L(CDW)=L(CD) ∩ L(CW)

  9. Example

  10. Lattice Decomposition:Prefix-Based Classes • Equivalence relation • binary relation ≡ : reflexive, symmetric, transitive • partitions the set P into disjoint subsets called equivalence classes • An equivalence relation θk on the lattice P(I) where p(X, k)=X[1:k], the k length prefix of X • θk : prefix-based equivalence relation • Lemma: • Each equivalence class [X]θk induced by the equivalence relation θk is a sublattice of P(I)

  11. Example of Equivalence Class P(I) induced by θ1 [A]θ1 induced by θ2

  12. Search for Frequent Itemsets • Bottom-up Search Algorithm:

  13. Search for Frequent Itemsets cont. • Example for [A]θ1

  14. Search for Frequent Itemsets cont. • Top-down Search Algorithm:

  15. Search for Frequent Itemsets cont. • Example for [A]θ1 Gray circle: infrequent itemset Black circle: maximal frequent itemset White circle: minimal infrequent itemset

  16. Search for Frequent Itemsets cont. • Hybrid Search Algorithm:

  17. Search for Frequent Itemsets cont. • Example for [A]θ1, assume that AD and ADW are frequent

  18. Generating Smaller Classes: Maximal Clique Approach • Pseudoequivalence relation • binary relation ≡ : reflexive, symmetric • partitions the set P into possible overlapping subsets called pseudoequivalence classes • k-association graph Gk=(V, E) • Vertex set • Edge set

  19. Maximal Clique Approach cont. • Clique • A complete subgraph of a graph • Mk: the set of maximal cliques in Gk • A pseudoequivalence relation φk on the lattice P(I) • φk : maximal-clique-based pseudoequivalence relation • Bottom-up search: reduce the number of intersections Top-down search: lead to smaller maximum element size

  20. Example

  21. Experiment • 比較的演算法

  22. Experimental Result

  23. Experimental Result cont.

  24. Experimental Result cont.

More Related