240 likes | 263 Views
Scalable Algorithms for Association Mining. Mohammed J. Zaki IEEE Transactions on Knowledge and Data Engineering , 2000. 報告人 : 吳建良. Abstract. Frequent itemset Vertical tid-list database format Lattice-theoretic approach Prefix-based and Maximal-clique-based partition
E N D
Scalable Algorithms for Association Mining Mohammed J. Zaki IEEE Transactions on Knowledge and Data Engineering, 2000 報告人:吳建良
Abstract • Frequent itemset • Vertical tid-list database format • Lattice-theoretic approach • Prefix-based and Maximal-clique-based partition • Pattern search strategy • Bottom-up, top-down and hybrid search • Require a few databases scan
Itemset Enumeration: Lattice-theoretic approach • Partial order • Reflexive, Antisymmetric, Transitive • Partial ordered set poset • Lattice • Poset • Any two element have unique join and meet • join= =least upper bound(a, b) • meet= = greatest lower bound(a, b) • Atom • Immediately succeed least element
Power set lattice P(I) Gray circle: frequent itemset Black circle: maximal frequent itemset
Lemma • Lemma1: • All subsets of a frequent itemset are frequent • All supersets of an infrequent itemset are infrequent • Lemma2: • The maximal frequent itemsets uniquely determine all frequent itemsets
Support Counting • L(X): each database item X its tid-list • Support of k-itemset • Intersect the tid-list of any two of its (k-1)- itemset • Example • L(CD)=L(C) ∩ L(D) • L(CDW)=L(CD) ∩ L(CW)
Lattice Decomposition:Prefix-Based Classes • Equivalence relation • binary relation ≡ : reflexive, symmetric, transitive • partitions the set P into disjoint subsets called equivalence classes • An equivalence relation θk on the lattice P(I) where p(X, k)=X[1:k], the k length prefix of X • θk : prefix-based equivalence relation • Lemma: • Each equivalence class [X]θk induced by the equivalence relation θk is a sublattice of P(I)
Example of Equivalence Class P(I) induced by θ1 [A]θ1 induced by θ2
Search for Frequent Itemsets • Bottom-up Search Algorithm:
Search for Frequent Itemsets cont. • Example for [A]θ1
Search for Frequent Itemsets cont. • Top-down Search Algorithm:
Search for Frequent Itemsets cont. • Example for [A]θ1 Gray circle: infrequent itemset Black circle: maximal frequent itemset White circle: minimal infrequent itemset
Search for Frequent Itemsets cont. • Hybrid Search Algorithm:
Search for Frequent Itemsets cont. • Example for [A]θ1, assume that AD and ADW are frequent
Generating Smaller Classes: Maximal Clique Approach • Pseudoequivalence relation • binary relation ≡ : reflexive, symmetric • partitions the set P into possible overlapping subsets called pseudoequivalence classes • k-association graph Gk=(V, E) • Vertex set • Edge set
Maximal Clique Approach cont. • Clique • A complete subgraph of a graph • Mk: the set of maximal cliques in Gk • A pseudoequivalence relation φk on the lattice P(I) • φk : maximal-clique-based pseudoequivalence relation • Bottom-up search: reduce the number of intersections Top-down search: lead to smaller maximum element size
Experiment • 比較的演算法