190 likes | 345 Views
From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns. Yabo Xu, Jeffrey Xu Yu, Guimei Liu, Hongjun Lu, Proc. of the 2002 IEEE International Conference on Data Mining (ICDM’02) Adviser:Jia-Ling Koh Speaker: Yu-ting Kung. Introduction.
E N D
From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey Xu Yu, Guimei Liu, Hongjun Lu, Proc. of the 2002 IEEE International Conference on Data Mining (ICDM’02) Adviser:Jia-Ling Koh Speaker: Yu-ting Kung
Introduction • In this paper, the main tasks (for a multi-user environment) are: • Constructing an initial tree for a transactional database (in memory) • Mining using the tree constructed in memory • Converting in-memory tree a disk-based tree • Loading a portion of the tree on disk into main memory for mining (mining is the same as 2)
Introduction(Cont.) • Data structures─PP-tree • A novel coded prefix-path tree • Two representations: • Memory–based pp-tree • Disk-based pp-tree • Mining algorithm─PP-Mine • Upon the memory-based pp-tree • Outperforms FP-growth
Transaction Database • Example: (min_sup threshold 2 ) ( a:3, b:1, c:3, d:3, e:3, f:1, g:2, h:1, i:1)
Node: labelled for a frequent item in F A Coded Prefix-Path Tree • PP-tree: an order tree F: a set of frequent 1-items in total order (like frequency order) Children of a node: listed following the order The rank Nof a PP-tree: (N= 5) the number of frequent 1-itemset
A Complete Prefix-Path Tree • tree (rank N): a PP-tree with nodes Node is encoded in: pre-order traversal Shaded subtree: a PP-tree
PP-tree Representations • Memory-based representation ─ PPM-tree • Disk-based representation ─ PPD-tree • Represented as • T: tree structure in disk • F: stores N frequent 1-itemset • I: index indicating the ranges of codes in disk-pages • : min_sup uesd to build PPD-tree on disk • See Figure 3 (next page)
item:count Code of range code:count PP-tree Representation-Fig3
How to built a PPD-tree? • Construction • A PPM-tree with in memory (task1) • Conversion • PPM-tree PPD-tree • Using coding scheme
PP-Mine: Mining in-Memory • Based on two properties: (ij, ik: a single item prefix-path) ( : a prefix-path in general which are possible empty) • Property1 (push-down)
PP-Mine (Cont.) • Property 2 (push-right) • Example: Figure 4 (next page)
Experiment(1) • Data Sourse • Sparse dataset─T25I20D100K(10K items) • Dense dataset ─ T40I10D1K(101 items) • Three Algorithms to be compared • PP-Mine • FP-growth • H-Mine • Compare the only mining-phase
Experiment Result(2) • Data Sourse─T40I10D100K(59 items) • = 50% • Two Algorithms to be compared • PP-Mine • FP-growth • Compare • t(FP)─the time for FP-growth to construct a FP-tree • t(PP) ─the time for PP-load to load a sub PPD-tree + the timetoconstructa small PPM-tree
Conclusion • PP-Mine algorithm outperformsFP-tree • Reduce both I/O cost and CPU cost • PP-Mine algorithm outperforms H-mine • Minimizescountingcost
Coverage • Definition A coverage of a prefix-path-prefix is defined as all the -prefixes that contain -prefix (including -prefix itself)