Internet Tomography and Geography

Internet Tomography and Geography • What is this area all about? • Related work in the area • Main Paper • WEBMAPPER’s features • How WEBMAPPER works • WEBMAPPER results summarized

Internet Tomography / Geography • Where, geographically, are my clients located (or servers)? • What’s the best server for me to get content from? • Given two IP’s, what’s the latency between them? • What IP’s all come from the same geographic location? • Generally, show me a complete and accurate map of the Internet

Related Work • B. Krishnamurthy and J. Wang, “On network-aware clustering of webclients,” in Proceedings of SIGCOMM ’00, August 2000 • Problem: Which IP’s are the same • Does BGP-table based clustering • Groups IP’s by common administrative control • Passive • S. Jamin, C. Jin, Y. Jin, D. Raz, Y. Shavitt, and L. Zhang, “On the placement of intenet instrumentation,” in Proceedings of INFOCOM ’00, March 2000, pp. 295–304. • Problem: What’s latency between 2 IP’s • Clusters the internet based on BGP address prefixes • Places “Tracers” everywhere in the Internet that actively ping each other

Related work • An Investigation of Geographic Mapping Techniques for Internet Hosts • Padmanabhan, Subramanian, SIGCOMM 2001 • Problem: geo-locate from IP • 3 solutions • GeoTrack: infer location by DNS / traceroute • GeoPing: probe target from known locations, triangulate [active] • GeoCluster: do passive BGP, IP-prefix based clustering

Related work • Predicting Internet Network Distance with Coordinate-Based Approaches • Problem: Given 2 IP’s, what’s latency • Ng, Zhang, CMU, INFOCOM 2002 • Active network of landmarks, pinging each other • Better math model (GNP) for interpolating distance than IDMaps • Proprietary Solutions (Akamai, etc) • Problem: What’s the closest server to an IP?

Clustering and Server Selection Using Passive Monitoring M. Andrews, B. Sheperd, A. Srinivasan, P. Winkler, F. Zane (Bell Labs), INFOCOM 2002

Problem: Server Selection • Given a client, and a set of servers, all with identical content, tell me the “best” server • For the client, “best” means lowest latency or highest throughput • For the whole system, it may mean something else

What is Passive Monitoring? • Content Servers don’t ping each other • Content Servers don’t ping clients • No pinging is done: no additional network traffic is introduced • Instead, servers record the Round-Trip Time of TCP handshake from a client

WEBMAPPER

WEBMAPPER: Clusterer • Clusterer uses the client-server latency pairs reported by the servers • Determines which address prefixes correspond to the same network location, and which don’t • Does more than just find the closest server to each cluster; assigns probabilities to each to balance the network flow

WEBMAPPER Output

WEBMAPPER: Big Tree • Giant Binary Tree of all IP addresses • Well, not all… assume last 8 bits always clustered • Root of tree: 0.0.0.0/0 • Children 0.0.0.0/1, 128.0.0.0/1 • Leaves 123.123.123.123/24

WEBMAPPER: Big Tree • Leaves also store • Sum of recorded distances (per server) • Squared sum of recorded distances • Number of recorded distances • Leaf data is periodically aged exponentially (I = I * 0.9)

WEBMAPPER: Small Tree • Clusters are formed from big tree by folding children into their parents when they’re “similar” to their siblings • Uses statistical test to determine “similar” (two-sample t-test) • Threshold based

Assigning Clusters to Servers Assigning servers to clusters is complicated • 1. Testing Index • Futzes with the probability of a cluster being assigned to a server • Based on multi-armed bandit solution

Assigning Clusters to Servers

Assigning Clusters to Servers • Calculate server capacity • Each cluster assignment has a latency cost • Try to minimize the total cost without breaking any server’s capacity • Graph theory to the rescue: min-cost flow

Results • Experiment 1: • 28 day log of busy web traffic • Recorded client IP, time, RTT • Clustering produced 17,270 clusters • Here are 12 example clusters for high traffic days

Results • Experiment 2: • Set up a west-coast and east-coast server • Force clients to download something from each (to get actual measurments) • Have clients download something from the “either-or” server (WEBMAPPER powered)

Opinions • Statistics and weighing formulae are cool • What’s a good way to tell if the clustering is any good aside from eyeing samples? • Two server test is all we can get out of Bell Labs?

Internet Tomography and Geography

Internet Tomography and Geography

Presentation Transcript

Tomography

Network Tomography

Tomography

Tomography

Probabilistic tomography

Computed Tomography

Computed Tomography

IPS tomography IPS-MHD tomography

Computed Tomography

Computed Tomography

Computed Tomography

Internet Tomography

Network Tomography

Geography on your PC – internet and GIS tools

The Geography and Governance of Internet Addresses

Parallel Tomography

Computed Tomography

Network Tomography for the Internet: Open Problems

Tomography and Reconstruction

Computed Tomography

The Geography and Governance of Internet Addresses

Tomography