390 likes | 479 Views
Measuring and Extracting Proximity in Networks. By - Yehuda Koren, Stephen C.North and Chris Volinsky - Rahul Sehgal. Introduction. Network Information hidden in a network
E N D
Measuring and Extracting Proximity in Networks By - Yehuda Koren, Stephen C.North and Chris Volinsky - Rahul Sehgal
Introduction • Network • Information hidden in a network • What is Proximity? • Why do we need proximity in a network? • Methods for measuring and extracting proximities in a network. • CFEC ( Cycle free effective conductance) and how to compute cycle-free escape probability • Extracting proximity through proximity graphs obtained by CFEC • Working with large networks • Experiments • Conclusion • Questions
Network • Collection of edges or links • Collection of nodes • These edges and links help in deciding the proximity in a given network.
Hidden information in a network C D E B A F
Hidden information in a network C • If two people speak on the phone to many common friends, the probability is high that they will talk to each other in the future, or perhaps that they already communicate through some other medium as email. D E B A F
Hidden information in a network C • If two nodes are connected to a common node then it might be possible that they have a strong relationship or they don’t have any relationship at all. for example: in a telephone network where many people call to service center but it might be possible that two people who are calling they don’t know each other at all. D E B A F
What is proximity? • Proximity is a method of measuring distance or closeness between different objects. • It is a method of finding hidden relationship between the objects. • It is a method of finding similarities and helps in clustering objects or nodes.
Need of proximity • It measures potential information exchange between two non-linked objects through intermediaries • It can measure the extent to which two nodes belong to each other. • It helps in knowing the likelihood that a link will exist in future. • In social network settings proximity helps to predict or track the propagation of an idea, product or disease . • It helps in discovering unexpected communities in a network.
Various Methods • Graph-theoretic distance: • It is length of shortest path connecting two nodes measured either by number of hops or sum of weight of edges Limitations • Proximity decays as nodes become farther apart. • Information may be lost due to friction or noise at a particular node. • This method doesn’t assume that proximity can exist via multiple paths.
Various Methods contd… • Network Flow (maximal network flow) • Limited capacity is assigned to each edge, depending upon the weight of that edge and then compute the maximal number of units that can be simultaneously delivered from node s to node t. • It prefers high weight edges and captures the premise that an increasing number of alternative paths increases the proximity. For example we consider the adjacent figure. a1 B A b1 A B a1
Various Methods contd… • Network Flow (maximal network flow) Limitations: • It disregards the length of the path. • It also follow that the maximal s-t flow (that is graph flow from node s to node t) in a graph equals the minimal s-t cut – that is the minimal edge capacity we need to remove to disconnect s from t, therefore we cannot implement it in a robust system.
Various Methods contd… • Effective Conductance (EC): • Modeling of networks as an electric circuit by treating the edges as resistors whose conductance is the given edge weight. • In this method we keep the starting node (s) as 1 and the end node (t) as 0. and then we solve the linear equations for getting voltage and current on each edge. • It accounts for both path length and number of alternative paths . • It avoids dependence on single shortest path and bottlenecks • It has a monotonicity property which states that in an electrical resistor network, increasing the conductance of any resistor or increasing the number of resistors increase conductance between any two nodes.
Various Methods contd… • Limitations of Effective Conductance- • Monotonicity has its limitations consider following example s1 a1 t1 same EC a1 t s
Various Methods contd… • Sink augmented effective conductance • Each node in a network is connected to a universal sink which is at voltage zero. • This universal sink competes with the node t. • Universal sink “tax” each node that absorbs a portion of out going current. Consequently it forces all the node to have degree greater than 1. So, our restriction for degree one node doesn’t exist any more.
Various Methods contd… • Limitations of sink augmented effective conductance • Its required to know how much current will flow through each node . Understanding how such parameter will influence the proximity is a difficult task. • It destroys the concept of monotonicity . It means whenever a new node is added to the network it has a direct link to the sink but not to t. It strengthens sink and it compete with node t and thus the proximity between s and t is lost. So, we look for a solution that can overcome the limitations of above methods…..
Random Walk • Definitions: • Random Walk is a transition from one state to another without depending on the previous state. The transition could be to the same state also. a3 t a4 s a2 a1
Random Walk • Random walk in network proximity is the infinite number of attempts that is made to reach from starting node s to end node t. It might be possible that when we traverse this path we might return back to the starting node s. s t
Random Walk • Explanation • In network it might be possible that during random walk we might go back to s without going to t. • As we can see in the diagram it might be possible that we return back to s via path s–a1-a2-a3-s without going to t. • We are having a cyclic path which is leading us back to starting point s. continued =>
Random Walk Diagram a3 a2 s a1 a5 t a4
What is our goal? • We have to improve the effective conductance measure and avoid any cyclic path…..
CFEC( Cycle Free Effective Conductance) • It considers random walk interpretation • DEFINITION: The cycle-free escape probability from s to t is the probability that a random walk originating at s will reach t without visiting any node more than once. • In random walk, a probability of transition from node i to node j is = probability of transition from node i to node j =weight of edge from i to j = degree of node i.
CFEC( Cycle Free Effective Conductance) • The probability that a random walk starting at v1 for path P= v1-v2-…-vr, will follow this path is given by:
Features of CFEC: • We have following equalities: Multiplying by the degree: R = set of simple paths from s to t, simple path are those that never visit the same node twice • CFEC discourages long paths as the probability of following the path decays exponentially with its length. It is given by • It supports proximity measure for multiple paths. • Degree-1 nodes dilute the significance of path from s to t. So we cut the main graph into sub-graph. And we preserve the original degree of the nodes.
Explanation of CFEC features CFEC discourages long paths: ck s t c1 c2 Proximity decreases
Computing Cycle Free Escape Probability • Restrict the sum in following equation to K most probable simple path. • Edge weights are transformed into edge lengths, establishing 1-1 correspondence between path probability and path length. • Path probability is given as
Extracting Proximity Through Proximity Graphs : • Extract a subgraph that maximizes the ratio: = subgraph = constant, • Find subset of R, which required for above problem. • Set of simple paths R is sorted in ascending order of weights of paths.
Extracting Proximity Through Proximity Graphs : • Optimizing the given formula using branch and bound path merging algorithm:
Working with large networks • Growing candidate graphs via {s, t} neighborhoods • Producing the candidate graph is to find a sub graph containing the highest weight paths originating at either s or t. • Now, the problem becomes , find a sub graph containing shortest paths originating at either s or t. Our objective is to expand the neighborhoods of s and t. This is done by Dijkstra’s algorithm for computing shortest path on graphs with non-negative edge lengths.
Working with large networks contd…. • Determining neighborhood size • we determine paths which are not longer than L – log(€). Path lengths greater than is are not useful. Here L is the length of shortest s-t path. • In practice, (L – log(€))/2 – neighborhoods might be too large.
Working with large networks contd…. • Pruning the neighborhood: • We prune the neighbor in such a way that dist(s,i)+dist(t,i) > β , where s is our starting node and t is terminating node. • Then, we exclude from the neighborhood any i for which dist(s,i)+dist(i,t) > L – log(€).
Experiments: • Online movie database IMDB. • Co-authorship graph. • Telecommunication graph.
Telecommunication Graph • Extract the candidate graph by growing neighborhoods from the two nodes of interest. • 2000 random pairs of telephone numbers to calculate the CFEC value. • For 1808(90%) of these pairs paths between them were found. Others do not have any known path between them, this could be due to, these number were not in frequent use. • We have values for as 1,5,10 and 50.
Conclusion: • CFEC proximity allows us to readily compute proximity graphs, which are small portions of the network that are aimed at capturing a related proximity value. • An analyst studying proximity in a graph has to focus on the most relevant part of the graph. • It is extension of connection graph which is capable of presenting compact relationship between objects of a network. • We can deduce relationship between more than two endpoints, the flexibility to handle edge direction, and the fact that they are obtained by solving an intuitively tunable optimization problem.