1 / 66

Mining for Tree-Query Associations in a Graph

Mining for Tree-Query Associations in a Graph. Jan Van den Bussche Hasselt University, Belgium joint work with Bart Goethals (U Antwerp, Belgium) and Eveline Hoekx (U Hasselt, Belgium). Graph Data.

luz
Download Presentation

Mining for Tree-Query Associations in a Graph

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining for Tree-Query Associations in a Graph Jan Van den Bussche Hasselt University, Belgium joint work with Bart Goethals (U Antwerp, Belgium) and Eveline Hoekx (U Hasselt, Belgium)

  2. Graph Data A (directed) graph over a set of nodes N is a set G of edges: ordered pairs ij with ij  N. Snapshot of a graph representing the metabolic pathway of a human. Applications: life sciences, biology, social sciences, WWW, ...

  3. Graph Mining Transactional category • dataset: set of many small graphs (transactions) • frequency: transactions in which the pattern occurs (at least once) • ILP:Warmr [AGM, FSG, TreeMiner, gSpan, FFSM, Horvath-Ramon-Wrobel] Single graph category • dataset: single large graph • frequency: copies of the pattern in the large graph [Subdue, Vanetik-Gudes-Shimony, SEuS, SiGraM, Jeh-Widom] Focus on pattern mining, few work on association rule mining!

  4. Tree-Query Pattern • powerful tree-shaped pattern • inspired by conjunctive database queries • special features: • existential nodes • parameterized nodes • occurrence of the pattern in G is any homomorphism from the pattern in G frequency:x z:0zGz8 Gzx G

  5. Association rules • fully fledged associations over tree-query patterns • example:

  6. Experimental results: Real-life datasets • Food webnodesedges confidence = 89% frequency = 176

  7. Experimental results: Real-life datasets • Food webnodesedges confidence = 89% frequency = 176

  8. Experimental results: Food web nodesedges 45% 55%

  9. Experimental results: Real-life datasets • Protein interactions graph nodesedges confidence = 10%

  10. Experimental results: Protein interaction graphnodesedges 90%

  11. Outline rest of the talk • Formal problem definition • Algorithm • overall approach • levelwise generation of tree patterns • generation of containment mappings • generation of parameter assignments • Equivalent association rules • Certhia • Performance and Experimental results • Future work

  12. Tree pattern

  13. Tree pattern

  14. Tree pattern

  15. Tree pattern

  16. Tree pattern select distinct G3.to as x from G G1, G G2, G G3 where G1.from=5 and G1.to=G2.from and G1.to=G3.from and G2.to=8

  17. Matching

  18. Matching

  19. Matching

  20. Matching

  21. Matching

  22. Matching

  23. Matching

  24. Frequency   frequency = 3

  25. Tree Query H, head P, body • Q = (H,P)

  26. Association Rule • AR: Q1 Q2 • Confidence (AR) = freq(Q2)/freq(Q1) • Q2 Q1 { (x1,x2,x3) | Q1(x1,x2,x3)  G}  { (x,x,6) | Q2(x,x,6)  G }

  27. Examples of Association Rules (1) (2)

  28. Association Rule • AR: Q1 Q2 • Confidence (AR) = freq(Q2)/freq(Q1) • Q2 Q1 { (x1,x2,x3) | Q1(x1,x2,x3)  G}  { (x,x,6) | Q2(x,x,6)  G }

  29. Containment Mapping containment mapping

  30. Containment Mapping containment mapping

  31. Containment Mapping containment mapping

  32. Containment Mapping containment mapping

  33. Containment Mapping containment mapping Q Q  containment mapping fromQ to Q

  34. Problem statement: Mining tree queries Given a graph G and a threshold k, find all tree queries that have frequency at least k in G, those queries are called frequent.

  35. Problem statement: Association rules • Input: • a graph G • minsup • Qleft frequent in G • minconf • Output: All association rules QleftQ • frequent in G • confident in G.

  36. Algorithm: mining tree queries x1 x2 x4 x3 x        x2 x2 x1 x1 Outer loop: Generate,incrementally, all possible trees of increasing sizes. Avoid generation of isomorphic trees. Inner loop: For each newly generated tree, generate all queries based on that tree, and test their frequency. ...

  37. Outer loop • It is well known how to efficiently generate all trees uniquely up to isomorphism • Based on canonical form of trees. • [Scions, Li-Ruskey, Zaki, Chi-Young-Muntz]

  38. Inner loop: Levelwise approach • A query Q is characterized by • Q set of existential nodes • Q set of selected nodes • Labeling Qof the selected nodes by constants. • Qspecializes Q if , and  agrees with  on . • If Qspecializes Q then freqQ freqQ • Most general query: T = (, , )

  39. Inner loop: Candidate generation • CanTab is a candidate query FreqTabis a frequent query • Q’=’’ is aparent of Q= if either: • ’ and  has precisely one more node than ’, or • ’ and  has precisely one more node than ’ • Join Lemma: Each candidacy table can be computed by taking the natural join of its parent frequency tables.

  40. Inner loop: Frequency counting • Each candidacy table can be computed by a single SQL query. (ref. Join lemma). • Suppose: Gfromto table in the database, then each frequency table can be computed with a single SQL query. •  • formulate in SQL and count •  • formulate  in SQLE • natural join of E with CanTab • group by  • count each group

  41. Inner loop: Example x x x x x

  42. Inner loop: Example x x x x x • Join expression: • CanTab{x}{x,x} = FreqTabxx⋈FreqTabxx ⋈FreqTabxx

  43. Inner loop: Example x x x x x • Join expression: • CanTab{x}{x,x} = FreqTabxx⋈FreqTabxx ⋈FreqTabxx

  44. Inner loop: Example x x x x x • Join expression: • CanTab{x}{x,x} = FreqTabxx⋈FreqTabxx ⋈FreqTabxx

  45. Inner loop: Example x x x x x • Join expression: • CanTab{x}{x,x} = FreqTabxx⋈FreqTabxx⋈FreqTabxx

  46. Inner loop: Example x x x x x • Join expression: • CanTab{x}{x,x} = FreqTabxx⋈FreqTabxx ⋈FreqTabxx

  47. Inner loop: Example x x x x x • SQL expression E for x select distinct G1.from as x1, G2.to as x3, G3.to as x4 from G G1, G G2, G G3 where G1.to = G2.from and G3.from = G2.from

  48. Inner loop: Example x x x x x • SQL expression for filling the frequency table: select distinct E.x1, E.x3, count(E.x4) from E, CanTab{x2}{x1,x3} as CT where E.x1 = CT.x1 and E.x3 = CT.x3 group by E.x1, E.x3 having count(E.x4) >= k

  49. Algorithm: Mining association rules Loop 1:Generate incrementally all possible trees T of increasing sizes. Loop 2: For each T, generate all frequent tree patterns P based T. Loop 3: For each P, generate all containment mappings f from Pleft to P. Loop 4: For each f, generate Q=(f(Hleft),P) and all parameter instantiations for Qleft Q.

  50. Pattern database • For each P a table FreqTabP, that contains all frequent parameter instantiations.  Pattern Database

More Related