1 / 23

CanTree: a tree structure for efficient incremental mining of frequent patterns

CanTree: a tree structure for efficient incremental mining of frequent patterns. Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque ICDM ’ 05 報告者:林靜怡 2006/11/15. Introduction. Many existing incremental mining algorithms are Apriori-based

vonda
Download Presentation

CanTree: a tree structure for efficient incremental mining of frequent patterns

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CanTree: a tree structure for efficient incremental mining of frequent patterns Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque ICDM’05 報告者:林靜怡 2006/11/15

  2. Introduction • Many existing incremental mining algorithms are Apriori-based • not easily adoptable to FP-tree based frequent-pattern mining

  3. Related Work • The FELINE Algorithm with the CATS Tree • The AFPIM Algorithm

  4. The FELINE Algorithm with the CATS Tree • CATS tree (Compressed and Arranged Transaction Sequences tree) • Allows frequent-pattern mining without the generation of candidate itemsets • requires one database scan to build the tree

  5. CATS Tree • New transactions are added at the root level • At each level, items of the new transaction are compared with children (or descendant) nodes. • If the same items exist in both 1.the transaction is merged with the node at the highest frequency level 2.The remainder of the transaction is then added to the merged nodes • repeated recursively until all common items are found.

  6. CATS Tree • Any remaining items of the transaction are added as a new branch to the last merged node. • The frequency of a node is lower than or equal to the frequencies of its ancestors • If the frequency of a node becomes higher than its ancestors, then it has to swap with the ancestors

  7. Weaknesses • tree construction could be computationally expensive • checks existing tree paths one-by-one until a mergeable one is found • extra cost is required for the swapping or merging of nodes.

  8. The AFPIM Algorithm • Adjusting FP-tree for Incremental Mining • all the “frequent” items are arranged in descending order of their global frequency • when the ordering is changed, items in the tree need to be adjusted • When previously infrequent item becomes “frequent” in the updated database, it needs to rescan and build a new FP-tree.

  9. preMinsup:35% minsup:55% 4 x 0.35 = 1.4

  10. Weaknesses • the amount of computation spent on swapping, merging, and splitting tree nodes • requirement for an additional mining parameter preMinsup • finding an appropriate value for this parameter is not easy

  11. Weaknesses • when the database is updated, item frequencies may have changed. This results in changes in the ordering. • Both FELINE and AFPIM algorithms need lots of swapping, merging, and splitting of tree nodes

  12. Canonical-Order Tree (CanTree) • requires one database scan • items are arranged according to some canonical order • in lexicographic order or alphabeticalorder • some specific order depending on the item properties

  13. Property • Property 1 The ordering of items is unaffected by the changes in frequency caused by incremental updates. • Property 2 The frequency of a node in the CanTree is at least as high as the sum of frequencies of its children.

  14. CanTree • Transactions can be easily added to the CanTree without any extensive searches for mergeable paths • mine frequent patterns from the tree in a fashion similar to FP-growth(a divide-and-conquer approach).

  15. g: eg,deg,cdeg,bcdeg,abcdeg e: de,cde,bcde,abcde,ce, bce,abce,de,bde,abde f: ef,def,bdef,abdef d: cd,bcd,abcd,bd,abd c: bc,abc b: ab

  16. Discussion • CanTrees can be used for incremental constrained mining • Efficiency and Memory Issues • On the surface, it appears CanTree may take a large amount of memory. • CanTree may not be as compact as the CATS tree,but it significantly reduce computation and time • assume we have enough main memory space

  17. Experiment • Database:generated by the program developed at IBM Almaden Research Center • consists of 1M records with an average transaction length of 10 items and a domain of 1000 items • time-sharing environment in a 1 GHz machine

  18. Experiment

  19. Experiment

  20. Experiment

  21. Experiment

  22. Conclusion • provide the user with a simple, but powerful, tree structure for efficient FP-tree based incremental mining • CanTree can be easily maintained • Can used for efficient incremental constrained mining

More Related