An Efficient Algorithm for Discovering Frequent Subgraphs

An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者：蔡明瑾

b y x a a x Introduction • Structural pattern • Biology, chemistry • Chemical compounds • graph • vertex– item • edge – relation between items • Undirected connected labeled graph

b a x y x x a b a a x y Graph Isomorphism • G1(V1,E1) and G2(V2,E2) are topologically identical to each other. • There is a mapping from v1 to v2,such that each edge in E1 is mapped to E2 and vice versa. v0 v0 = v1 v2 v1 v2

b a y x x x a b a a y x Canonical labeling v1 a v2 a v0 b • Adjacency list v0 v0 v1 v1 v2 v2 code = baaxxy v1 b v2 a v0 a v0 v0 || v1 v1 v2 v2 code = abaxyx

Canonical labeling • Different permutation of vertices lead to different canonical label. • |v|! • Largest codes

b x x a a y Vertex invariants • Properties don’t change across isomorphism mappings. • Vertex degree • Vertex label • siblings

Vertex Degrees and Labels • Adjacency Matrix • Partitioning verteices by degrees and labels that every partition contains vertices with same degree and label

b x x a a y Vertex Degrees and Labels v1 a v2 a v0 b v0 v0 v1 v1 v2 v2 code = baaxxy Degree：p0={v0,v1,v3}:2 Degree+label： p0={ v1,v2}:(2,a),p1={v0}:(2,b)

b x x a a y Vertex Degrees and Labels v2 a v0 b v1 a v1 v0 v2 v1 v2 v0 code = aabyxx p0={ v1,v2}:2,a,p1={v0}:2,b 原本：3! 現在：2!x 1!

Running example minsup =2 0 0 0 0 0 0 1 1 1 1 2 3 3 0 0 0 1 0 2 0 2 0 4 0 1 3 3 Frequent 1_subgraph 1 2 1 1 g0 g1 g2 2 0 0 1 1 0 1 3 1 4 2 2

Running example minsup =2 0 1 0 1 3 0 2 2 1 c0 ,c1 c3 ,c2 c2 ,c3 c1 c3 c2 c0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 2 1 …… 0 3 0 2

0 1 0 1 3 0 2 2 1 c3,c4 c2 c3 c4 0 c2 c1 0 0 1 0 0 1 0 3 1 1 1 2 1 2 0 3 3 0 Frequent 2_subgraph 2 1

Frequency computing • Id-list • Intersection two k-subgraph’s id-list • Frequent->find the support • Not frequent -> pruned

Candidate generation • Joining two frequent k-subgraph ->k+1 candidate subgraph • Having same k-1 core • Vertex labeling • Multiple cores • Multiple automorphisms

Vertex labeling

Multiple automorphism

Multiple cores

0 1 0 1 3 0 2 2 1 c3,c4 c2 c1 c4 c2 0 c3 0 0 1 0 1 0 0 3 1 1 1 2 2 1 0 3 3 0 ,q1 q0 q1 2 1 q1 0 0 q0 0 q2 0 1 0 1 0 1 1 0 0 0 1 1 1 1 2 2 1 2 1 2 1 0 2 1 0 0 0 不符合downward closure 不符合downward closure

Experiment • AMD 1.53GHz • 2GB main memory • Linux OS • chemical compound: • PTE(340),66 atom types and four bond types,27 edges/graph on average • DTP(223,644),104 atom types and three bound types and 22 edges/graph on average • Synthetic datasets

PTE and DTP

Synthetic datasets

Synthetic datasets|D|=10000,|S|=200,|LE|=1,minsup=2%

An Efficient Algorithm for Discovering Frequent Subgraphs

An Efficient Algorithm for Discovering Frequent Subgraphs

Presentation Transcript

yucca: an efficient algorithm for small molecule docking

CURE: An Efficient Clustering Algorithm for Large Databases

An Efficient Polynomial Delay Algorithm for Pseudo Frequent Itemset Mining

An Adaptive Algorithm for Finding Frequent Sets in Landmark Windows

CBW: An Efficient Algorithm for Frequent Itemset Mining

An Efficient Online Algorithm for Hierarchical Phoneme Classification

An Efficient Video Similarity Search Algorithm

An Efficient Central Path Algorithm For Virtual Navigation

Discovering Informative Subgraphs in RDF Graphs

Mining Frequent Subgraphs

Mining Frequent Subgraphs

Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

An efficient parameterized algorithm for m-set packing

An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets

An Efficient P-center Algorithm

CONTOUR: an efficient algorithm for discovering discriminating subsequences

An Efficient Algorithm for Answering Graph Reachability Queries

An Efficient Algorithm for Mining Frequent Itemests over the Entire History of Data Streams

An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets

SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism

Discovering Frequent Subgraphs over Uncertain Graph Databases under Probabilistic Semantics

An efficient algorithm for detecting frequent subgraphs in biological networks