270 likes | 283 Views
Mining Closed Relational Graphs with Connectivity Constraints. Xifeng Yan, X. Jasmine Zhou and Jiawei Han SIGKDD 05 ’ 報告者:蔡明瑾 2005/12/09. Introduction. Relational graphs Modeling large scale networks Biological networks Social networks Each node represents a distinct object
E N D
Mining Closed Relational Graphs with Connectivity Constraints Xifeng Yan, X. Jasmine Zhou and Jiawei Han SIGKDD 05’ 報告者:蔡明瑾 2005/12/09
Introduction • Relational graphs • Modeling large scale networks • Biological networks • Social networks • Each node represents a distinct object • genes,enzymes(酵) • DBLP: co-author relations,article reference relations • Graph is large • 10K nodes,1M edges • Mining closed frequent graphs with edge connectivity at least K
Edge Connectivity K (G) • Given a graph G • Edge cut Ec :E(G)- Ec is disconnect • Min cut : min(Ec) • K (G) = |min cut| • Edge Cut Ec :separates V(G) into two sets V and V’, V->V’ • V ∩ V’ =φ • V ∪ V’ = V(G) • Edge in Ec connect V and V’
Edge Connectivity K (G) Minimum cut : e1 K (G) = 1 Average Degree:3.25 Minimum Degree:3
Condensation • G G’,G* is a graph formed from G’ with all vertices in G condensed into a single vertex. • If K(G) > K(G’), then K(G’) = K(G*)
Condensation cont. • V->V’ be min cut of G’ • Since K(G) > K(G’),then V(G) must be subset of V or V’ K(G): 3 2 2
Usage of condensation • Reduce the cost of calculating edge conn. if edge conn. of its subgraph is known. • Condensing all the vertices of g into a single vertex in g’, we only need to check K(g*) • g* is smaller than g’,then cost will be reduced.
Exclusion • G G’, Ec be an edge cut of G’, |Ec|<K • If K(G) ≧ K, then Ec∩ E(G) = φ • G1 edge cut{e1,e2} • If we want to find a subgraph of G1 has edge cut at least 3, it will not have edges e1 and e2
K-Decomposition • Break a graph into non-overlapping subgraphs such that their conn. is at least K • If K(G)of closed frequent graph G is less than K • Find subgraphs G’ whose K(G) ≧K
Minimum Degree Constraint • For any graph, its edge conn. ≦ its minimum degree • If a graph satisfies the edge conn., it must satisfy the minimum degree constraint first.
Shadow graph • G be a frequent graph and X be a set of edges which can be added to G such that G{e} e X is connected and frequent. • Graph G X is called shadow graph of G , is written as • The degree of v in the shadow graph of G is written deg(v) • If deg(v) < K,remove all edges of v in X
SPLAT • Row enumeration based • Intersect relational graphs and decomposes them to obtain highly connected graph
Experiment • Real dataset 32 micro-array experiments • Synthetic dataset • 2.5GHZ Intel Xeon • Main memory 3GB • RedHat 9.0 • C++ with STL
Synthetic dataset Density = average degree / |vertices|
Real dataset • 32 micro array • Node :6661個object(酵母基因) • Edges: 600k
Edge conn. = 3 sup > 19 COS3 UNKNOWN COS1 COS5 COS6 COS2 COS7 COS4 Rest of 7 are belong to a family of proteins(蛋白質) Located closedly in the 染色體
Ribosomal biogenesis 核醣體生源體 Transcription DNA UNKNOWN Predicted involved in RNA Processing
rRNA Processing UNKNOWN