1 / 21

SPIN: Mining Maximal Frequent Subgraphs from Graph Databases

SPIN: Mining Maximal Frequent Subgraphs from Graph Databases. Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004. Introduction. Graphs model a relations among data Inter-disciplinary research Huge number of recurring patterns To mining only maximal frequent subgraphs.

kathy
Download Presentation

SPIN: Mining Maximal Frequent Subgraphs from Graph Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004

  2. Introduction • Graphs model a relations among data • Inter-disciplinary research • Huge number of recurring patterns • To mining only maximal frequent subgraphs. • None of its super graphs are frequent

  3. Advantages • Reducing the total number of mined subgraphs • Saving space and analysis effort • Reducing mining time • Non-maximal frequent subgraph can be reconstructed. • Maximal frequent subgraphs are of most interest in some appliations.

  4. Algorithm • Mining all frequent trees from a general graph database. • Tree normalization is simpler than graph. • In certain applications, most of the frequent subgraphs are really trees. • Use current subgraph mining algorithm • Mining subtrees from a forest

  5. Algorithm • Reconstruct all maximal subgraphs from the mined trees. • For each frequent tree T, find all frequent subgraphs whose canonical spanning tree are isomorphic to T • Enumerate the equvalence class of a tree T • Maximal subgraph mining

  6. Tree-based Equivalence Classes • A subtree T is a spanning tree of G if T contains all nodes in G. • Maximal one: canonical spanning tree • Group all frequent subgraphs in to equivalence classes based on spanning trees.

  7. Spanning tree

  8. Tree-based Equivalence Classes back

  9. b y x a b b a b a b b b b b a a y y x y y y y x x x y x a a a a a a a a a a a a x x x y x y y x a a a a a a a a x y a a 12 singletons group

  10. Enumerating Graphs from Trees • G C :{e1,e2,…,en} • If frequent -> edge C (candidate set) • Search space of G: G:C ={G+y|y 2C} GO

  11. Optimizations • Removing a set of frequent subgraphs that can not be maximal from a search space • Locally maximal:frequent subgraph G is maximal in its equivalence class • Globally maximal:maximal frequent in a graph database • Avoid enumerating subgraphs which are notlocally maximal.

  12. Bottom-up Pruning • G’ = G C • G’ is frequent : each graph in search space is a subgraph of G’ and not maximal

  13. Tail Shrink • Embedding of G in G’ is a subgraph isomorphism f from G to G’ • Two embeddings of L in P l1->P1, l2->P2, l3->P3, l4->P4 l1->P1, l2->P3 ,l3->P2 ,l4->P4 go

  14. Tail Shrink • candidate edge (i, j, el) is associative to a graph G • It appears in every embedding of G in a graph databases • If a tree T contains a set of associative edges, any maximal frequent graph G, a superset of T, must contains all associative edges.

  15. Tail Shrink • Remove associative edges from candidate sets and augment them to T without missing any maximal ones • Reducing the search space • Prune the entire equivalences class in certain cases • A set of associative edges C of a tree T is lethal • G’ = T C has a canonical spanning treedifferent from that of T go

  16. External-Edge Pruning • Remove one equivalence class without any knowledge about its candidate edges • External-edge for a graph G: it connects a node in G and a node not in G • (i, el, vl) is associative to a graph G • Every embedding f of G in a graph G’, G’ has a node v with the label vl • v connects to the node f(i) with an edge label el in G’ • Not exist node j V[G] such that v = f(j)

  17. Associative external edges

  18. Experiments • 2.8GHz Pentium Xeon, • 512KB L2 cache,2GB main memory • Red Hat Linux 7.3 • C++ Programming language

  19. Synthetic Dataset D10KT30L200I11V4E4

  20. DTP CA data set

  21. DTP CM data set

More Related