180 likes | 416 Views
Doubling Dimension in Real-World Graphs. Melitta Lorraine Geistdoerfer Andersen. Recap: Definition. A metric space is a set X together with distance function d that gives a non-negative distance between any 2 points in X and satisfies 3 properties:
E N D
Doubling Dimension in Real-World Graphs Melitta Lorraine Geistdoerfer Andersen
Recap: Definition • A metric space is a set X together with distance function d that gives a non-negative distance between any 2 points in X and satisfies 3 properties: • d(x,y) = 0 if and only if x = y • d(x,y) = d(y,x) • The triangle inequality holds: d(x,y) + d(y,z) ¸d(x,z) • The doubling dimension of a metric space (X,d) is the least k such that any ball of radius R can be covered by 2k balls of radius R/2. • So the doubling dimension is log2 of the maximum over all centers and all radii of the number of balls of half radius it takes to cover a ball with a specific center and radius.
An Example with a Set of Points • In this case, all of the points can be covered by 2k=2 balls of radius R/2. • Each of the balls also have a doubling dimension of 2. • And each of those contain no more than 22 points. • When the doubling dimension is a constant (i.e. bounded) the metric is called a doubling metric.
Some Uses of Doubling Dimension • Chan, Gupta, Maggs, and Zhou proved that for any network that has a metric with a bounded doubling dimension, a hierarchical routing structure can be imposed on it. • With this structure, the network can be addressed in such a way as to be able to get routing information from the addresses of the source and the destination. • This routing also achieves minimum or near-minimum path length. • There are also efficient nearest-neighbor algorithms that work with a graph of low doubling dimension.
Now We Can Apply It To A Graph • We found a 200,000 node router level graph of the Internet at http://www.caida.org/tools/measurement/skitter/router_topology/. • This was an adjacency graph, so we treated all edges as unit distances. • The doubling dimension was ~14.
Average Covering for Each Radius • Plotted on a log scale (because the x axis is also on a log scale), the average number of balls increased nearly linearly until it reached radius 8. • One interpretation of the downturn is the finite nature of the graph. • At R=64, only one ball of radius 32 is required to cover the entire ball. Hence, the diameter of the graph is at most 32.
But What About Latencies? • This was all well and good for an adjacency graph, but for routing you actually want to know the fastest route. So we needed a weighted graph. • http://www.cs.cornell.edu/People/egs/meridian/data.php yielded a graph that measured latencies between 2,500 sites. • The doubling dimension of this weighted graph was ~9.
Covering for a Weighted Graph • Plotted on a log scale, the average number of balls formed a more symmetric curve than the unweighted graph. • There were few nodes within range for the lower radii, and at the higher radii, we again saw the effects of a finite graph. • One thing of note is the spike of 2 after 1 had already been reached.
A Possible Explanation • One thing that could cause the spike is a 2 cluster graph. • Everything within a ball of a certain size can be covered by a ball of half the radius, for both clusters. • But when you double that radius, you run into the other cluster, so 2 balls are required to cover the whole thing.
Infinite Graphs? • Another thing to note is that the doubling dimension is finite because the graph is finite. • If this were a section of an infinite doubling metric the doubling dimension would eventually flatten out and become constant. • Though the graph does start to flatten out at the peak, we don’t know if this merely indicates that the finite nature of the graph is affecting it.
Other Graphs • We had so much fun with doubling dimension on these graphs, we wanted to find other graphs to play with. But what other interesting graphs are out there? • The Citation Graph connects authors of papers by references. An edge indicates that the author cited a paper by the other author in one of his papers. • People use these graphs to study nearest neighbor algorithms. • The doubling dimension of this graph is ~12.
The Citation Graph • This graph looks similar to the router graph. • The Citation Graph also has unit distances for the edges, so this similarity makes sense. • The earlier downward turn could be due to the high degree of each node. Many authors write many papers, and cite a large number of papers in them.
More Graphs • Doubling dimension can give us information about many types of graphs. • For instance, using the Internet Movie Database a graph of actors can be created with edges connecting two actors who were in the same movie. • The doubling dimension of this graph is ~14.
Yet Another Signature Graph • This graph started it’s downward trend right away. • One possible explanation is that this graph is much denser than the router graph, so the balls of radius 2 cover many points that may not be within 1 hop of each other.
The Effects of Scaling • The actor graph had 400,000 nodes. This made it an interesting graph for experimentation with scaling. If we included only a portion of the nodes, what would that do to the dimension?
Doubling Dimensions • Plotted on a log scale, the graph increases logarithmically until the maximum doubling dimension is reached.
Conclusions • Finite graphs have bounded doubling dimensions. • Different types of graphs have different signature cover graphs. • The number of nodes in a graph has some relation to the doubling dimension. • I like playing with graphs.
Future Work • Actually implementing the routing algorithm on a graph. • Measuring latencies of adjacent routers to get a more accurate picture to work with. • Figuring out bounds on how scaling effects doubling dimension, possibly working with some infinite graphs.