340 likes | 492 Views
Counting subgraphs. Support measures for graphs Natalia Vanetik. This research was carried out under the supervision of Prof. Eyal S. Shimony and Prof. Ehud Gudes. Published in DAMI Journal vol. 13(2), September 2006. Research directions. Multiflows in graphs
E N D
Counting subgraphs Support measures for graphs Natalia Vanetik PhD seminar CS BGU
This research was carried out under the supervision of Prof. Eyal S. Shimony and Prof. Ehud Gudes. Published in DAMI Journal vol. 13(2), September 2006. PhD seminar CS BGU
Research directions • Multiflows in graphs • Counting functions in graphs PhD seminar CS BGU
Problem description Let D and G be graphs. We need to measure statistical significance of G as a subgraph of D. Observe instances (isomorphic copies) of G within D. D D D G G G G G G G G G has zero significance G has some significance G has high significance PhD seminar CS BGU
Definition A counting function on graphs that measures statistical significance of one graph G as a subgraph of another graph D is called a support measure. It is obvious that when G is not a subgraph of D, this function should return 0. Otherwise, it should return value greater than 0. PhD seminar CS BGU
Traditional support measure An item-setX in relational model is a set of tuples (f1,v1),…,(fn,vn) where fi are the names of fields and vi are values. A transaction TsupportsX if the value of fi in it equals to vi for every i=1…n. A support of an item-set X is the number of transactions in the database that support X. PhD seminar CS BGU
Admissibility It is important, especially for graph mining, that support measure is admissible or has a downward closure property or antimonotonicity: support of a graph cannot be smaller than support of its supergraph. support of a graph cannot be larger than support of any of its subgraphs. PhD seminar CS BGU
Motivation • Significant amount of data in the world is graph-like and not relational. • Graph data is usually represented by one or more large graphs. Transaction-like graph datasets are rare. • Traditional support definition is not admissible. • Admissible support measures are required for mining the graph data and other tasks. PhD seminar CS BGU
Instance graph • We observe all the subgraphs of G in D, called instances. • Instances are thought to be connected if they have an edge/node/subgraph in common. • A graph with instances of G as nodes and edges between every pair of connected vertices is called the instance graph of G in D. PhD seminar CS BGU
Instance graph: an example G D Instance graph of G in D PhD seminar CS BGU
Intuitive support measures G • Just count the instances. • Perform some sort of a weighted count. D CountD(G)=3 G D WcountD(G) = CountD(G) / 3 =1 PhD seminar CS BGU
The problem with intuitive approach is… G …that these measures are not admissible: CountD(G)=3 D g CountD(g)=1 G WcountD(G)=1+1=2 D g WcountD(g)= 1/2+1/2+1/2=3/2 PhD seminar CS BGU
What is going on? • A counting function can be viewed as acting on the instance graph. • A graph g and its supergraph G have different instance graphs Ig and IG, and Ig is obtained from IG by a series of graph operations. • If a counting function does not decrease under these operations, it is admissible (for specific G and g, at least). PhD seminar CS BGU
Operations on instance graphs We narrowed it down to the following three operations on instance graphs: • clique contraction, • node addition, • edge deletion. PhD seminar CS BGU
Clique contraction A clique is contracted into a single node. Another node is incident to the new one only if it was incident to all the nodes in the clique. Intuition behind it: G G g PhD seminar CS BGU
Node addition A new node and some edges incident to this node are added. Intuition behind it: G G g g g PhD seminar CS BGU
Edge removal An edge is removed. Intuition behind it: G G g g PhD seminar CS BGU
The main result Theorem. A support measure on graphs is admissible if and only if it does not decrease under following operations on instance graphs: • clique contraction, • edge removal, • node addition. PhD seminar CS BGU
Sufficiency To prove sufficiency for these three operations, we need to show that for every graph D and every pair of graphs G and g, s.t. g is a subgraph of G, the instance graph Ig of g is obtained from the instance graph IG of G by these operations alone. PhD seminar CS BGU
Sufficiency: proof outline The proof is constructive (algorithmic). The main idea is • to build a pair of mappings, first from instances of G to instances of g and second from instances of g to instances of G. • Perform clique contractions and node additions to obtain a vertex set of Ig from a vertex set of IG. • Perform edge deletions as necessary. PhD seminar CS BGU
Necessity To prove the necessity, we need to show that for every graph H and every operation (from the above list) that produces a graph h, there exist a database graph D and a pair of its subgraphs G and g, where g is a subgraph of G, so that H=IG and h=Ig. PhD seminar CS BGU
Necessity: proof outline • The proof is constructive. • Specific graphs G and g are constructed. For convenience, these graphs are labeled. • Intersection types for instances of G and g in D are defined. • D is constructed accordingly. PhD seminar CS BGU
Necessity: the patterns … d d Arms Top d d g c b G a a b Legs Bottom a a … a a Legs … a a PhD seminar CS BGU
Necessity: intersection Following intersection types are allowed in D: • Bottom overlap: all legs of two instances overlap. • Leg overlap: two instances have exactly one leg in common. • Arm overlap: two instances have exactly one arm in common. PhD seminar CS BGU
Bottom overlap: for clique contraction … … … d d d d d d d d d d d d c c c G2 G1 G3 b a a … a a PhD seminar CS BGU
Leg overlap: for node addition … … d d d d G1 G2 d d d d c c b b a a a … a a … a PhD seminar CS BGU
Arm overlap: for edge removal … … d d d G1 G2 d d d c c b b a a a a … a a a … a PhD seminar CS BGU
Necessity: proof outline • Use instances of G to construct the database graph D. • Prove that no additional instances of G arise from the overlaps. • Show that the instance graph of g arises from the instance graph of G by applying the chosen operation. PhD seminar CS BGU
MIS measure • MIS measure is the size of maximum independent set (anti-clique) in the instance graph. • It satisfies the necessity conditions (direct admissibility proof is also available). • It was used in several papers (Han, Kuramochi etc.) • No other admissible support measure have been found to date. PhD seminar CS BGU
MIS: example G D Instance graph IG of G in D MIS(IG)=1 PhD seminar CS BGU
Extensions • Necessary and sufficient conditions can be re-formulated for different pattern intersection types (for example, a common node can be considered an intersection). PhD seminar CS BGU
Open problems and conjectures • Is computation of an admissible support measure an NP-hard problem, regardless of the measure chosen? • Is any admissible support measure a function on MIS size? What kind of a function? PhD seminar CS BGU
Thank you! PhD seminar CS BGU
Questions? PhD seminar CS BGU