On Triangulation-based Dense Neighbourhood Graph Discovery

Explore dense graph patterns considering size and interactions, locating solutions with limited resources for large-scale graphs. Covering related works, terms definition, trial algorithms, experimental study, and future directions.

  1. On Triangulation-based Dense Neighbourhood Graph Discovery School of Computing National University of Singapore

  2. Outline • Motivation • Related Work • Terms Definition • Triangulation based DN-graph mining • Semi-streaming DN-graph model • Experimental Study • Future Work and Conclusion

  3. Motivation • Define dense graph pattern from the perspective that considers both the size of the substructure and the minimum level of interactions between vertices. • Locate dense patterns within unsolvable restricted resources for large scale graphs.

  4. Related Work • Other Dense Patterns • Clique/Quasi-Clique • High Degree Patterns • Dense Bipartite Patterns • Heavy Patterns • Triangle Counting • CSV • Density-based closed cliques discovery and a linear fashion visualization.

  5. Terms Definition

  6. Terms Definition (cont’d)

  7. DN-graph b a G Proof

  8. DN-graph and Other Dense Patterns Quasi-clique Close-clique (a maximal clique) DN-Graph

  9. DN-graph and Closed Clique Proof

  10. Computation Bottleneck in DN-graph Mining Most sub-graphs are not DN-graphs Most of these operations are redundant

  11. How to tackle the bottleneck ? • Reduce number of joins • Local maximal feature: two DN-graphs share no edge. • All edges sharing common vertices and local maximal λ values comprising of the DN-graph • Locating DN-graph using λ(e) value • All edges within DN-graph have equal λ(e) , noted as λmax • All edges connecting to neighboring vertices have a smaller λ values: λ(e) = λ(u,v) < λmax while u not in G’, v in G’ • Use approximating methods to compute λ(e) efficiently

  12. e

  13. Graph Triangulation • Given a graph triangle, the upper bound of the other two edges can be used to tighten the density estimation of the third edge. λ(w,v) = 3 w v λ(u,w) = 3 λ(u,v)=5 u

  14. Triangulation Based DN-graph Mining • DN-graph Mining Algorithm • Step One: Sort vertices according to their degrees. • Step Two: Generate triangles in a streaming fashion. • Step Three: Obtain the local density information gradually along the triangle streams. • Initial Upper Bound: TC(e) the number of triangles an edge participates in.

  15. Counting of Supporting Nodes Not Supporting Node n2 n2 n2 n2 n2 n3 n1 n1 n1 n1 n1 5 6 8 4 n4 7 5 5 3 a a a a a b b b b b = 4

  16. Convergence Converge First Iteration Second Iteration Initialization Two Support Vertices One Support Vertex 2 V5 The local maximal neighborhood size 𝜆=2 2 𝜆(V2V3) decreases by one 𝜆(V3V6) decreases by one 𝜆(V2V6) decreases by one V6 3 2 2 2 3 2 V3 2 V1V2 1 V2V6 3 2 V4 3 2 1 V3V6 V1V3 1 4 3 2 2 V2V3 4 V3V5 3 2 2 2 2 V2V4 V2V5 V2 V3V4 V5V6 V1 1 2 2 V4V6 2

  17. Semi-Streaming Graph Model • Graph vertices fit into main memory, while edges are in the secondary storage, in the form of adjacency list. • Random access in primary storage (i.e. memory) and only sequential access in secondary storage. • As a feasible solution towards a streaming graph G(V,E), it should not exceed log |V| scans of G’s adjacency list.

  18. DN-graph mining in semi-streaming model • Estimating shared neighbor size using min-wise independent set property. • Min-wise independent set property: Two sets A, B over a universe X, and a uniformly chosen permutation π over X. If there is a total order in X, then the probability that min(π(A)) = min(π(B)) is the same as the Jaccard Coefficient J(A, B)= (n(A)∩n(B))/ (n(A)Un(B)). • We can use that to estimate shared neighbor size (n(A)∩n(B)).

  19. Experimental Setting • Quad-Core AMD Opteron(tm) processor 8356 • 128GB memory • 700 GB hard disk • OS: Windows Server 2003

  20. Experimental Study • Comparison with CSV on Stock Market Dataset

  21. Convergence • Dataset: Flickr graph (1.7million vertices and 22.6 million edges) • Running time per iteration is between 55 minutes to 1 hour.

  22. StreamDN Performance on Flickr Dataset • StreamDN over-estimates with respect to BiTriDN algorithm’s results by 72% during the first 66 scans. • StreamDN can handle streaming setting with reasonable accuracy.

  23. DN-graph Semantics in Various Domain

  24. Future work and Conclusion • DN-graph • DN-graph Mining Problem • Semi-streaming Approach • Future Work

  25. Thank You & Questions

  28. Proof: A DN-graph is a local maximum graph

  29. Proof: DN-graph and Closed Clique

