210 likes | 491 Views
SPIN: Mining Maximal Frequent Subgraphs from Graph Databases. Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004. Introduction. Graphs model a relations among data Inter-disciplinary research Huge number of recurring patterns To mining only maximal frequent subgraphs.
E N D
SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004
Introduction • Graphs model a relations among data • Inter-disciplinary research • Huge number of recurring patterns • To mining only maximal frequent subgraphs. • None of its super graphs are frequent
Advantages • Reducing the total number of mined subgraphs • Saving space and analysis effort • Reducing mining time • Non-maximal frequent subgraph can be reconstructed. • Maximal frequent subgraphs are of most interest in some appliations.
Algorithm • Mining all frequent trees from a general graph database. • Tree normalization is simpler than graph. • In certain applications, most of the frequent subgraphs are really trees. • Use current subgraph mining algorithm • Mining subtrees from a forest
Algorithm • Reconstruct all maximal subgraphs from the mined trees. • For each frequent tree T, find all frequent subgraphs whose canonical spanning tree are isomorphic to T • Enumerate the equvalence class of a tree T • Maximal subgraph mining
Tree-based Equivalence Classes • A subtree T is a spanning tree of G if T contains all nodes in G. • Maximal one: canonical spanning tree • Group all frequent subgraphs in to equivalence classes based on spanning trees.
b y x a b b a b a b b b b b a a y y x y y y y x x x y x a a a a a a a a a a a a x x x y x y y x a a a a a a a a x y a a 12 singletons group
Enumerating Graphs from Trees • G C :{e1,e2,…,en} • If frequent -> edge C (candidate set) • Search space of G: G:C ={G+y|y 2C} GO
Optimizations • Removing a set of frequent subgraphs that can not be maximal from a search space • Locally maximal:frequent subgraph G is maximal in its equivalence class • Globally maximal:maximal frequent in a graph database • Avoid enumerating subgraphs which are notlocally maximal.
Bottom-up Pruning • G’ = G C • G’ is frequent : each graph in search space is a subgraph of G’ and not maximal
Tail Shrink • Embedding of G in G’ is a subgraph isomorphism f from G to G’ • Two embeddings of L in P l1->P1, l2->P2, l3->P3, l4->P4 l1->P1, l2->P3 ,l3->P2 ,l4->P4 go
Tail Shrink • candidate edge (i, j, el) is associative to a graph G • It appears in every embedding of G in a graph databases • If a tree T contains a set of associative edges, any maximal frequent graph G, a superset of T, must contains all associative edges.
Tail Shrink • Remove associative edges from candidate sets and augment them to T without missing any maximal ones • Reducing the search space • Prune the entire equivalences class in certain cases • A set of associative edges C of a tree T is lethal • G’ = T C has a canonical spanning treedifferent from that of T go
External-Edge Pruning • Remove one equivalence class without any knowledge about its candidate edges • External-edge for a graph G: it connects a node in G and a node not in G • (i, el, vl) is associative to a graph G • Every embedding f of G in a graph G’, G’ has a node v with the label vl • v connects to the node f(i) with an edge label el in G’ • Not exist node j V[G] such that v = f(j)
Experiments • 2.8GHz Pentium Xeon, • 512KB L2 cache,2GB main memory • Red Hat Linux 7.3 • C++ Programming language
Synthetic Dataset D10KT30L200I11V4E4