SPIN: Mining Maximal Frequent Subgraphs from Graph Databases

SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004

Introduction • Graphs model a relations among data • Inter-disciplinary research • Huge number of recurring patterns • To mining only maximal frequent subgraphs. • None of its super graphs are frequent

Advantages • Reducing the total number of mined subgraphs • Saving space and analysis effort • Reducing mining time • Non-maximal frequent subgraph can be reconstructed. • Maximal frequent subgraphs are of most interest in some appliations.

Algorithm • Mining all frequent trees from a general graph database. • Tree normalization is simpler than graph. • In certain applications, most of the frequent subgraphs are really trees. • Use current subgraph mining algorithm • Mining subtrees from a forest

Algorithm • Reconstruct all maximal subgraphs from the mined trees. • For each frequent tree T, find all frequent subgraphs whose canonical spanning tree are isomorphic to T • Enumerate the equvalence class of a tree T • Maximal subgraph mining

Tree-based Equivalence Classes • A subtree T is a spanning tree of G if T contains all nodes in G. • Maximal one: canonical spanning tree • Group all frequent subgraphs in to equivalence classes based on spanning trees.

Spanning tree

Tree-based Equivalence Classes back

b y x a b b a b a b b b b b a a y y x y y y y x x x y x a a a a a a a a a a a a x x x y x y y x a a a a a a a a x y a a 12 singletons group

Enumerating Graphs from Trees • G C :{e1,e2,…,en} • If frequent -> edge C (candidate set) • Search space of G： G:C ={G+y|y 2C} GO

Optimizations • Removing a set of frequent subgraphs that can not be maximal from a search space • Locally maximal：frequent subgraph G is maximal in its equivalence class • Globally maximal：maximal frequent in a graph database • Avoid enumerating subgraphs which are notlocally maximal.

Bottom-up Pruning • G’ = G C • G’ is frequent : each graph in search space is a subgraph of G’ and not maximal

Tail Shrink • Embedding of G in G’ is a subgraph isomorphism f from G to G’ • Two embeddings of L in P l1->P1, l2->P2, l3->P3, l4->P4 l1->P1, l2->P3 ,l3->P2 ,l4->P4 go

Tail Shrink • candidate edge (i, j, el) is associative to a graph G • It appears in every embedding of G in a graph databases • If a tree T contains a set of associative edges, any maximal frequent graph G, a superset of T, must contains all associative edges.

Tail Shrink • Remove associative edges from candidate sets and augment them to T without missing any maximal ones • Reducing the search space • Prune the entire equivalences class in certain cases • A set of associative edges C of a tree T is lethal • G’ = T C has a canonical spanning treedifferent from that of T go

External-Edge Pruning • Remove one equivalence class without any knowledge about its candidate edges • External-edge for a graph G: it connects a node in G and a node not in G • (i, el, vl) is associative to a graph G • Every embedding f of G in a graph G’, G’ has a node v with the label vl • v connects to the node f(i) with an edge label el in G’ • Not exist node j V[G] such that v = f(j)

Associative external edges

Experiments • 2.8GHz Pentium Xeon, • 512KB L2 cache,2GB main memory • Red Hat Linux 7.3 • C++ Programming language

Synthetic Dataset D10KT30L200I11V4E4

DTP CA data set

DTP CM data set

SPIN: Mining Maximal Frequent Subgraphs from Graph Databases

SPIN: Mining Maximal Frequent Subgraphs from Graph Databases

Presentation Transcript

Frequent Item Mining

Technologies for Mining Frequent Patterns in Large Databases

Parallel Mining of Maximal Frequent Itemsets form Databases

Algorithms for Mining Maximal Frequent Itemsets -- A Survey

LCM ver.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets

Efficient Algorithms for Mining Maximal Frequent Concatenate Sequences In Biological Datasets

Graph databases

Mining Frequent Itemsets over Uncertain Databases

Fast Frequent Free Tree Mining in Graph Databases

Mining Frequent Patterns

Frequent Subgraph Mining

Mining Frequent Subgraphs

Mining Frequent Subgraphs

Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

Diagonally Subgraphs Pattern Mining

Frequent Subgraph Pattern Mining on Uncertain Graph Data

GraphSig : A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases

Mining Frequent Itemsets over Uncertain Databases

Mining Probabilistically Frequent Sequential Patterns in Uncertain Databases

CloseGraph : Mining Closed Frequent Graph Patterns

Discovering Frequent Subgraphs over Uncertain Graph Databases under Probabilistic Semantics