310 likes | 526 Views
L L N L. G raph X -Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs. Hanghang Tong , Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad. Input. Output. Query Graph. Matching Subgraph. Attributed Data Graph. Terminology: `` Conform ’’. Matching Subgraph conforms.
E N D
L L N L Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad KDD 2007, San Jose
Input Output Query Graph Matching Subgraph Attributed Data Graph
Terminology: ``Conform’’ Matching Subgraph conforms Query Graph
Terminology: ``Interception’’ Intermediate node matching node matching node matching node matching node Matching Subgraph Query Graph Path 12-13-4 is an Interception
Terminology: ``Instantiate’’ Matching Subgraph Ht Query Graph Hq Node 11 instantiates SEC node Htinstantiates Hq
Roadmap • Introduction • Problem Definition • Motivations • How to: Graph X-Ray • Experimental Results • Conclusion
Motivation: Why Not SQL? • Case 1: Exact match does not exist • Q: How to find approximate answer? • Case 2: Too many exact matches • Q: How to rank them?
Motivation: Why Not SQL? (Cont.) • Case 3: Exact match might be not the best answer • ``Find CEO who has heavy contact with Accountant’’ • Q: how to find right? Exact match 1 direct connection Inexact match Many indirect connections
Motivation: Efficiency • Why Not Subgraph Isomorphism? • Polynomial for fixed # of pattern query • Q1: How to scale up linearly? • Q2: … and with a small slope?
Wish List • Effectiveness • Both exact match & inexact Match • Ranking among multiple results • ``Best’’ answer (proximity-based) • Efficiency • Scale linearly • Scale with small scope G-Ray meets all!
Roadmap • Introduction • Problem Definition • Motivations • How to: Graph X-Ray • Experimental Results • Conclusion
Preliminary: Center-Piece Subgraph [Tong+] Q Original Graph Black: query nodes CePS is meta opt. in G-Ray!
Preliminary: Augmented Graph • Data nodes • 1,…13 • Attribute nodes • a Footnote Aug. Graph is crucial for computation!
Step 1: SF Step 3: BR Step 2: NE Step 6: NE Step 4: NE Step 5: BR Step 7: BR Step 8: BR G-Ray: quick overview (for loop ) SF: Seed-Finder NE: Neighborhood -Expander BR: Bridge
Seed-Finder ( ) • Q: How to instantiate SEC node? • A: Footnote `11’ is close to some un-known data nodes for `CEO’`Account.’ and `Manager’
Neighborhood-Expander ( ) • Q: How to instantiate CEO node? • Step 1 Step 2? • A: • Footnote: • Step 3 Step 4? • Step 5 Step 6?
Step 6: NE Step 7: BR Bridge ( ) • Q: • A: Prim-like Alg. • To maximize • Should block node 11 and 7 • Footnote • Connection subgraph, or one single path? ?
Roadmap • Introduction • Problem Definition • Motivation • How to: Graph X-Ray • Experimental Results • Conclusion
Experimental Results • Datasets • DBLP • Node: author (315k) • Edge: co-authorship (1,800k) • Attribute: conference & year (13k) • KDD-2001, SIGMOD…
Effectiveness: star-query Query Result
Effectiveness: line-query Query Result
Effectiveness: loop-query Query Result
Efficiency Response Time • Scale linearly • Small slope • 3-5 Seconds # of Edges ~2 M edges
Roadmap • Introduction • Problem Definition • Motivation • How to: Graph X-Ray • Experimental Results • Conclusion
Conclusion • Graph X-Ray (G-Ray) • Best effort pattern match • in large attributed graphs • Scale linearly • with small slope • More details in Poster Session • Monday (tonight) • board number 8
Thank you! www.cs.cmu.edu/~htong G-Ray X-Ray
10 9 12 2 8 1 11 3 4 6 5 7 Proximity on Graph a.k.a relevance, closeness • Multi-faceted • Punish long path • Edge weight How to: ---- random walk with restart
0.03 0.04 10 9 0.10 12 2 0.08 0.02 0.13 8 1 0.13 11 3 0.04 4 0.05 6 5 0.13 7 0.05 Random walk with restart Nearby nodes, higher scores Ranking vector More red, more relevant
How to rank the results • Our goodness function • Measure the proximity between any two matching nodes if they are required to be connected. (two-way) • Multiply them together • In G-Ray, we approximately optimize this goodness functions • If we have multiple matching subgraphs, we can rank them according to this goodness functions
How to rank the results matching node matching node matching node matching node Goodness = Prox (12, 4) x Prox (4, 12) x Prox (7, 4) x Prox (4, 7) x Prox (11, 7) x Prox (7, 11) x Prox (12, 11) x Prox (11, 12)