1 / 27

gApprox: Mining Frequent Approximate Patterns from a Massive Network

This paper presents a method for mining frequent approximate patterns from large and complex networks, focusing on biological, social, and web networks. The algorithm explores pattern space and counts support, with experiments demonstrating its effectiveness. The study concludes that the technique is generalizable and can be adapted for faster mining of larger patterns.

clairea
Download Presentation

gApprox: Mining Frequent Approximate Patterns from a Massive Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. gApprox: Mining Frequent Approximate Patterns from a Massive Network Chen Cheny, Xifeng Yanz, Feida Zhuy, Jiawei Han [ICDM 2007]reporter: Che-Wei, Liang10/16 1

  2. Outline • Introduction • Problem Formulation • Algorithm • Pattern Space Exploration • Support Counting • Experiment • Conclusions 2

  3. Introduction • A set of graphs vs. a single network • Recently, a large number of graphs with massive sizes and complex structures in many applications. • Biological networks, social networks, Web. • demanding powerful data mining methods. • Now interested in patterns that frequently appear at many different places of a singlenetwork. 3

  4. Introduction △= degree of approximation = 5 Protein-Protein Interaction (PPI) network 4

  5. Two major complications 1. Mining frequent patterns in a single network • Partition it into regions • Each contains one occurrence of the pattern 2. Due to various inherent noise or data diversity, it is crucial to account for approximationsso that all potentially interesting patterns can be captured. 5

  6. Outline • Introduction • Problem Formulation • Algorithm • Pattern Space Exploration • Support Counting • Experiment • Conclusion 6

  7. Problem Formulation 7

  8. Approximate Pattern Occurrences • Injective function m: Vp → VG mapping each vertex v Vp to m(v) VG • Quantify the degree of approximation m incurs i.e., approximations can only happen within the matchable list. 8

  9. Approximate Pattern Occurrences 9

  10. Approximate Pattern Occurrences 10

  11. Approximate Pattern Occurrences 11

  12. Pattern Support with Approximation 12

  13. Pattern Support with Approximation 13

  14. Pattern Support with Approximation 14

  15. Outline • Introduction • Problem Formulation • Algorithm • Pattern Space Exploration • Support Counting • Experiment • Conclusion 15

  16. Algorithm • Two major issues: 1. Pattern Space Exploration 2. Support Counting • Enumerate approximate occurrences of each pattern in the network. • Decide the maximal number of disjoint occurrences. 16

  17. Pattern Space Exploration • Decompose pattern space • Find all connected vertex sets in G that contain 1. • Remove 1 from G, and find all connected vertex sets in the new graph G’ that contain 2. • And so on so forth … 17

  18. Pattern Space Exploration Example: Generating all connected vertex sets starting from 1.Stage1. Start from 1 and mark 1. Stage2. Expand from 1 to reach 2, 5, 6. Mark 2, 5, 6. There are totally seven connected vertex sets in this stage.{1,2}, {1,5}, {1,6}, {1,2,5}, {1,2,6}, {1,5,6}, {1,2,5,6} Stage3. Taking each of the seven connected vertex sets in stage 2 as a starting point, continue expansion. Stage4. Until there are no more unmarked vertices. 18

  19. 19

  20. 20

  21. 21

  22. Theorem 1 Explore() in Algorithm 1 is both complete and redundancy-free, i.e., given a network G (1) it only generates connected vertex sets in G. (2) it can generate all connected vertex sets in G. (3) it does not generate the same connected vertex set more than once. 22

  23. Support Counting A pattern P’s support is defined to be the maximal number of “disjoint” ones that can be chosen from P’s approximate occurrences in the network.— NP-Complete maximal independent set. Use algorithm 2 can provide an upperbound. 23

  24. Support Counting 24

  25. gApprox gApprox Combine with pattern space exploration and support counting. Conditional branch on the 3rd line of Algorithm 1’s DFS_horizontal() function. 25

  26. Experiment 26

  27. Conclusions Give an approximation measure and show its impact on mining. count a pattern’s support based on its approximate occurrences in the network. The techniques is general can be applied to networks from other domains. Can be modified to reach bigger, more interesting patterns even faster with some sacrifice on the completeness of mining results. 27

More Related