1 / 39

Mining Frequent Closed Cubes in 3D Datasets

Mining Frequent Closed Cubes in 3D Datasets. Liping Ji Kian-Lee Tan Anthony K. H. Tung. Computer Science Department National University of Singapore. Motivation. Frequent Closed Pattern (FCP) Mining: great importance, wide application Previous works all limited to 2D FCP mining

nvanderbilt
Download Presentation

Mining Frequent Closed Cubes in 3D Datasets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Frequent Closed Cubes in 3D Datasets Liping Ji Kian-Lee Tan Anthony K. H. Tung Computer Science Department National University of Singapore

  2. Motivation • Frequent Closed Pattern (FCP) Mining: great importance, wide application • Previous works all limited to 2D FCP mining biological data: gene-time, gene-sample market basket data: transanction-itemset • Extend the 2D FCP mining to the 3D context biological data: gene-sample-time marketing data: region-time-items

  3. Background • Frequent Pattern (FP) and Frequent Closed Pattern (FCP) minimum support threshold: minsup=2 Itemsets t1: a1 a2 a3 a5 t2: a1 a2 a3 t3: a1 a2 a3 a4 t4: a3 a5 Transactions

  4. Background • Frequent Pattern (FP) and Frequent Closed Pattern (FCP) minimum support threshold: minsup=2 Itemsets t1: a1 a2 a3 a5 t2: a1 a2 a3 t3: a1 a2 a3 a4 t4: a3 a5 Transactions

  5. Background • Frequent Pattern (FP) and Frequent Closed Pattern (FCP) minimum support threshold: minsup=2 Itemsets t1: a1 a2 a3 a5 t2: a1 a2 a3 t3: a1 a2 a3 a4 t4: a3 a5 FCP Transactions FP

  6. Background • Binary Mapping I t1: a1 a2 a3 a5 t2: a1 a2 a3 t3: a1 a2 a3 a4 t4: a3 a5 T

  7. Background • Binary Mapping I t1: a1 a2 a3 a5 t2: a1 a2 a3 t3: a1 a2 a3 a4 t4: a3 a5 T

  8. Frequent Closed Cube • 3D Dataset Height Slice Row Column

  9. Frequent Closed Cube • Slices by Height Dimension h3 h1 h2

  10. Frequent Closed Cube • Closed Cube: Maximal h3 h1 h2

  11. Frequent Closed Cube • Closed Cube: Maximal h3 h1 h2

  12. Frequent Closed Cube • Definition: Frequent Closed Cube (FCC) • Maximal: cannot be extended in any dimension • Frequent: satisfy minH, minR, minC threshods

  13. Frequent Closed Cube • Definition: Frequent Closed Cube (FCC) • Maximal: cannot be extended in any dimension • Frequent: satisfy minH, minR, minC thresholds

  14. RSM vs. CubeMiner • Representative Slice Mining (RSM) extend existing 2D FCP mining algorithms for FCC mining • CubeMiner operate on the 3D space directly

  15. RSM • Representative Slice (RS) Generation enumerate all possible combination of slices • 2D FCP Mining from each RS • Post-pruning to Remove Unclosed Cubes If a 2D FCP is contained in other slices besides its contributing slices, it is unclosed and hence removed; otherwise, it is retained.

  16. RSM • Slices by Height Dimension h3 h1 h2

  17. RSM

  18. RSM • Slices by Height Dimension h3 h1 h2

  19. CubeMiner Principle

  20. CubeMiner Principle

  21. CubeMiner: Cutters Slice h1 Cutters from h1

  22. Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4

  23. Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 Cutter Checking: A. Cutter Checking: check if the Cutter is applicable (A.) • Subset of the node: A. • Otherwise: N.A.

  24. Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 (h2h3,r1~r4, c1~c5 ) Left Tree: remove Cutter’s left atom h1 from parent node

  25. Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 (h2h3,r1~r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c5 ) Middle Tree: remove Cutter’s middle atom r1 from parent node

  26. Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 (h2h3,r1~r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c5 ) (h1~h3 ,r1~r4, c1c2c3c5 ) Right Tree: remove Cutter’s right atom c4 from parent node

  27. Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 (h2h3,r1~r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c5 ) (h1~h3 ,r1~r4, c1c2c3c5 ) h1 ,r2, c4c5 h1 ,r2, c4c5 h1 ,r2, c4c5 N.A. A. A. Next Cutter: checking

  28. Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 (h2h3,r1~r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c5 ) (h1~h3 ,r1~r4, c1c2c3c5 ) h1 ,r2, c4c5 h1 ,r2, c4c5 (h2h3 ,r2~r4, c1~c5 ) (h1~h3 ,r3r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c3 )

  29. Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 (h2h3,r1~r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c5 ) (h1~h3 ,r1~r4, c1c2c3c5 ) h1 ,r2, c4c5 h1 ,r2, c4c5 (h2h3 ,r2~r4, c1~c5 ) (h1~h3 ,r3r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c3 ) Subset Cube

  30. Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 (h2h3,r1~r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c5 ) (h1~h3 ,r1~r4, c1c2c3c5 ) h1 ,r2, c4c5 h1 ,r2, c4c5 (h2h3 ,r2~r4, c1~c5) (h1~h3 ,r3r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c3)

  31. Mining FCC: CubeMiner Splitting Tree (h1h2h3 ,r1r2r3r4, c1c2c3c4c5 ) Root h1,r1, c4 (h2h3,r1~r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c5 ) (h1~h3 ,r1~r4, c1c2c3c5 ) h1 ,r2, c4c5 Left Track Checking h1 ,r2, c4c5 (h2h3 ,r2~r4, c1~c5) (h1~h3 ,r3r4, c1~c5 ) (h1~h3 ,r2~r4, c1~c3)

  32. Parallelism • RSM • Task: mining of each Representative Slice • CubeMiner: • Task: mining of each branch • Processor: • Initial: keep a copy of the whole dataset • Independent and concurrent with few communication cost

  33. Mining FCC: Experiments • Real yeast cell-cycle regulated genes • Elutriation Experiments: 14*9*7161 • CDC15 Experiments: 19*9*7761 • Synthetic Data: IBM data generator • Synthetic 1: H*R*C=(8~20)*20*1000 • Synthetic 2: H*R*C=100*100*10000

  34. Experiments: Optimize CubeMiner • Optimal: sort slices by zero decreasing order • Prune off infrequent cubes early Elutritration(14*9*7161)

  35. Experiments: Optimize RSM • Optimal: enumerate slices by the smallest dimension • Slice enumeration takes relatively long processing time Elutritration(14*9*7161)

  36. Experiments: RSM vs. CubeMiner With the increase of the smallest dimension, CubeMiner outperforms RSM Synthetic Data (vary size of height dimension)

  37. Experiments: Parallelism • As the degree of parallelism increases, the response time decreases. • Optimal number of processors CDC15 (Vary Number of Processors)

  38. Conclusion • Notion of Frequent Closed Cube • RSM: efficient when one of the dimension is small • CubeMiner: superior for large datasets • Parallel RSM and CubeMiner

  39. Thank You!

More Related