240 likes | 643 Views
SimRank : A Measure of Structural-Context Similarity. Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom. Outline. Motivation Objective Introduction Basic Graph Model SimRank Random Surfer-Pairs Model Future Work Personal opinion.
E N D
SimRank : A Measure of Structural-Context Similarity Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Glen Jeh Jennifer Widom
Outline • Motivation • Objective • Introduction • Basic Graph Model • SimRank • Random Surfer-Pairs Model • Future Work • Personal opinion
Motivation • The problem of measuring “similarity” of objects arises in many applications.
Objective • The approach, applicable in any domain with object-to-object relationships. • Two objects are similar if they are related to similar objects.
Basic Graph Model • We model objects and relationships as a directed graph G=(V,E). • For a node v in a graph, we denote by I(v) and O(v) the set of in-neighbors and out-neighbors.
(1) SimRank • Basic SimRank Equation • If a=b then s(a,b) is defined to be 1. Otherwise, • Where C is a constant between 0 and 1. • Set s(a,b)=0 when or .
SimRank • Bipartite SimRank • Two types of objects. • Example : Shopping graph G.
(2) (3) SimRank • Let s(A,B) denote the similarity between persons A and B, for • Let s(c,d) denote the similarity between items c and d, for
(if ) (if ) (4) For , and for . SimRank • Computing SimRank-Naive Method • is a lower bound on the . • To compute from
SimRank • The space required is simply to store the results . • The time required is . • K:The number of iterations • :The average of |I(a)||I(b)| over all node pairs (a,b).
SimRank • Computing SimRank-Pruning • set the similarity between two nodes far apart to be 0. • consider node-pairs only for nodes which are near each other.
SimRank • Radius r, and average such neighbors for a node, then there will be node-pairs. • The time and space complexities become and respectively.
(5) Random Surfer-Pair Model • Expected Distance • Let H be any strongly connected graph. • Let u,v be any two nodes in H. • We define the expected distance d(u,v) from u to v as
(6) Random Surfer-Pair Model • Expected Meeting Distance(EMD).
(7) Random Surfer-Pair Model • Expected-f Meeting Distance • To circumvent the “infinite EMD” problem. • To map all distances to a finite interval. • Exponential function ,where is a constant.
Random Surfer-Pair Model • Equivalence to SimRank
Random Surfer-Pair Model • Theorem. • The SimRank score, with parameter C, between two nodes is their expected-f meeting distance traveling back-edges, for .
Future Work • Future Work. • Divided and conquer and merge. • Divided a corpus into chunks… • Ternary(or more) relationships.
Personal Opinion • We believe that the intuition behind SimRank can be used in many domains which based on objects to objects.