1 / 16

LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M iner

LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M iner. Takeaki Uno Tatsuya Asai Hiroaki Arimura Yuzo Uchida. National Institute of Informatics Kyushu University Kyushu University Kyushu University. 19/Nov/2003 FIMI 2003.

sorena
Download Presentation

LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M iner

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LCM: An Efficient Algorithm forEnumerating Frequent Closed Item SetsLinear time Closed itemset Miner Takeaki Uno Tatsuya Asai Hiroaki Arimura Yuzo Uchida National Institute of Informatics Kyushu University Kyushu University Kyushu University 19/Nov/2003 FIMI 2003

  2. ・sparse/dense (occ-deliv/diffsets) ・database reduction ・remove infrequent items Motivation ・exact enumeration of closed item set ・generation of all/maximal item set from closed item set Few solutions for small support Many solutions for even large support small supports IBMdatas BMS POS retail BMS web1,2 kosarak - We want to solve difficult problems in short time chess connect pumsb* accidents mushroom pumsb large supports #closed set =#freq. set #closed set <<#freq. set

  3. Outline of Our Research - Exact enumeration of closed item sets (no sophisticated pruning, post processing, nor memory for obtained closed item sets) - Enumerate all/maximal frequent item sets using closed item set - Algorithms for updating occurrences/maximality check in dense/sparse cases, and their adaptive hybrid - Save additional memoryuse (right first sweep, adjacency matrix only for large transactions)

  4. Exact Enumeration of Closed Item Sets - Introduce acyclic parent-child relationship on freq. closed sets ( it induces a tree-shaped transversal route ) - Traverse the route in depth-first manner ( find a child, and go to it ) root (=φ) Exact enumeration (linear time to #closed set)  Any child is found by taking closure (in short time) Not need to store obtained item sets (small memory) can enumerate all closed item sets (even without min. support)

  5. Definition of Parent Closure = maximal item set with the same occurrences x X : closed item set parent of X = closure of X∩{1,…,i} where i is the maximum s.t.X ≠closure of X∩{1,…,i}  parent of X ⊆X, acyclic X' =child of X ⇔ X' is closure of X∪{i} for some i and (cond)X'\X includes no item <i x' child All children are found by taking closure of X∪{i} (cond)can be checked in short time by using some algorithms

  6. Adaptive Hybrid Algorithm Computation of OccurrencesX∪{i}for Sparse and Dense Cases - In sparse case, by tracing items of each occurrence of X (occurrence deliver : maybe a known technique) - In dense case, use diffsets(proposed by Zaki) We choose best one according to estimations of computation time in each iterations

  7. Maximal and All Frequent Sets closed item set class - Maximal frequent sets generated from closed item sets - All frequent sets (hypercube decomposition)  -- decompose classes of closed item sets into complete sublattices -- enumerate pairs of greatest/least elements of sublattices -- generate others from the pairs 000 ••• 0 01 lattice 111 ••• 1

  8. Result fast or usual fast small supports IBMdatas BMS POS retail BMS web1,2 kosarak chess Slower than others fast if support is small connect pumsb* accidents mushroom pumsb large supports

  9. Conclusion Fast without pruning, trie, other existing method - For data sets s.t. #freq. closed sets<<#freq. sets - large business datasets: BMS-web1,2, retails - machine learning datasets with small supports: UCI repository exact enumeration of closed item sets and hypercube decomposition perform well -These techniques are orthogonal to other techniques, ( ・database reduction, ・pruning infrequent items,… )  we can do better for large supports / accidents (blue area). - Parameter of hybrid is not tuned  not fast for kosarak, IBMdatas  now faster For further speed up

  10. We think… ● What are the real problem (bottleneck) ? ---- Mining structured item sets (closed item sets, association rule with threshold,… ) ●Is it only a counting problem ? ---- for all frequent item set mining, Yes. the problem is how to make the occurrences of an item set from other item sets (choose best way, represent ● Is maximal item set useful ? ---- closed item set is useful!! have an application for classification, association rule mining

  11. Pruning of infrequent sets really necessary? Some Observations frequency X X∪{1} X∪{2} X∪{3} X∪{4} X∪{5} Usually, < 1/2 Really need to prune ? - Computing occurrences for infrequent items from X Need for accelerating occurrence computation ? - Almost computation is for updating occurrences - There is a best e to get occurrence of X from X - e Can we design algorithm choosing e in each iteration ? how we find this e ? Does this accelerate? ( we can evaluate the lower bound of occurrence computation )

  12. Some Observations frequency X X∪{10} X∪{11} X∪{12} X∪{13} X∪{14} - Computing occurrences for infrequent items from X Usually, < 1/2 Really need to prune ?

  13. Right First Sweep D - Generate recursive calls in decreasing order of items - Clear memory after the recursive call - Re-use the memory in the following recursive calls B D D C A A A B E X∪{10} X∪{11} X∪{12} X∪{13} X∪{14} Child iterations need no memory

  14. Occurrence deliver E D C B A Compute T(X∪{i}) by tracing each occurrence of X D B D D C A A A B E X∪{10} X∪{11} X∪{12} X∪{13} X∪{14} In sparse cases, fast

  15. Checking (cond) of Closure - Check (cond) closure of X∪{i}\X includes no item <i - In sparse case, find an occurrence not including j, for all possible item j - In dense case, update occurrences of all frequent X∪{j}, and compute T(X∪{i}∪{j}) C C C ・・・ B B A A A X∪{1} X∪{2} X∪{i} X∪{14} ・・・ Quite faster than computing the closure of X∪{i}

  16. all closed maximal Results

More Related