220 likes | 232 Views
This paper presents a efficient algorithm for discovering frequent subgraphs in structural pattern datasets, particularly in the domains of biology and chemistry. The algorithm utilizes graph isomorphism, canonical labeling, vertex invariants, and vertex degrees and labels to efficiently compute the frequency of subgraphs. Experimental results demonstrate its effectiveness on real and synthetic datasets.
E N D
An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾
b y x a a x Introduction • Structural pattern • Biology, chemistry • Chemical compounds • graph • vertex– item • edge – relation between items • Undirected connected labeled graph
b a x y x x a b a a x y Graph Isomorphism • G1(V1,E1) and G2(V2,E2) are topologically identical to each other. • There is a mapping from v1 to v2,such that each edge in E1 is mapped to E2 and vice versa. v0 v0 = v1 v2 v1 v2
b a y x x x a b a a y x Canonical labeling v1 a v2 a v0 b • Adjacency list v0 v0 v1 v1 v2 v2 code = baaxxy v1 b v2 a v0 a v0 v0 || v1 v1 v2 v2 code = abaxyx
Canonical labeling • Different permutation of vertices lead to different canonical label. • |v|! • Largest codes
b x x a a y Vertex invariants • Properties don’t change across isomorphism mappings. • Vertex degree • Vertex label • siblings
Vertex Degrees and Labels • Adjacency Matrix • Partitioning verteices by degrees and labels that every partition contains vertices with same degree and label
b x x a a y Vertex Degrees and Labels v1 a v2 a v0 b v0 v0 v1 v1 v2 v2 code = baaxxy Degree:p0={v0,v1,v3}:2 Degree+label: p0={ v1,v2}:(2,a),p1={v0}:(2,b)
b x x a a y Vertex Degrees and Labels v2 a v0 b v1 a v1 v0 v2 v1 v2 v0 code = aabyxx p0={ v1,v2}:2,a,p1={v0}:2,b 原本:3! 現在:2!x 1!
Running example minsup =2 0 0 0 0 0 0 1 1 1 1 2 3 3 0 0 0 1 0 2 0 2 0 4 0 1 3 3 Frequent 1_subgraph 1 2 1 1 g0 g1 g2 2 0 0 1 1 0 1 3 1 4 2 2
Running example minsup =2 0 1 0 1 3 0 2 2 1 c0 ,c1 c3 ,c2 c2 ,c3 c1 c3 c2 c0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 2 1 …… 0 3 0 2
0 1 0 1 3 0 2 2 1 c3,c4 c2 c3 c4 0 c2 c1 0 0 1 0 0 1 0 3 1 1 1 2 1 2 0 3 3 0 Frequent 2_subgraph 2 1
Frequency computing • Id-list • Intersection two k-subgraph’s id-list • Frequent->find the support • Not frequent -> pruned
Candidate generation • Joining two frequent k-subgraph ->k+1 candidate subgraph • Having same k-1 core • Vertex labeling • Multiple cores • Multiple automorphisms
0 1 0 1 3 0 2 2 1 c3,c4 c2 c1 c4 c2 0 c3 0 0 1 0 1 0 0 3 1 1 1 2 2 1 0 3 3 0 ,q1 q0 q1 2 1 q1 0 0 q0 0 q2 0 1 0 1 0 1 1 0 0 0 1 1 1 1 2 2 1 2 1 2 1 0 2 1 0 0 0 不符合downward closure 不符合downward closure
Experiment • AMD 1.53GHz • 2GB main memory • Linux OS • chemical compound: • PTE(340),66 atom types and four bond types,27 edges/graph on average • DTP(223,644),104 atom types and three bound types and 22 edges/graph on average • Synthetic datasets