220 likes | 228 Views
Explore the field of Internet Tomography and Geography, including related work, main research papers, and how WEBMAPPER can help analyze and visualize the geographic location of clients and servers.
E N D
Internet Tomography and Geography • What is this area all about? • Related work in the area • Main Paper • WEBMAPPER’s features • How WEBMAPPER works • WEBMAPPER results summarized
Internet Tomography / Geography • Where, geographically, are my clients located (or servers)? • What’s the best server for me to get content from? • Given two IP’s, what’s the latency between them? • What IP’s all come from the same geographic location? • Generally, show me a complete and accurate map of the Internet
Related Work • B. Krishnamurthy and J. Wang, “On network-aware clustering of webclients,” in Proceedings of SIGCOMM ’00, August 2000 • Problem: Which IP’s are the same • Does BGP-table based clustering • Groups IP’s by common administrative control • Passive • S. Jamin, C. Jin, Y. Jin, D. Raz, Y. Shavitt, and L. Zhang, “On the placement of intenet instrumentation,” in Proceedings of INFOCOM ’00, March 2000, pp. 295–304. • Problem: What’s latency between 2 IP’s • Clusters the internet based on BGP address prefixes • Places “Tracers” everywhere in the Internet that actively ping each other
Related work • An Investigation of Geographic Mapping Techniques for Internet Hosts • Padmanabhan, Subramanian, SIGCOMM 2001 • Problem: geo-locate from IP • 3 solutions • GeoTrack: infer location by DNS / traceroute • GeoPing: probe target from known locations, triangulate [active] • GeoCluster: do passive BGP, IP-prefix based clustering
Related work • Predicting Internet Network Distance with Coordinate-Based Approaches • Problem: Given 2 IP’s, what’s latency • Ng, Zhang, CMU, INFOCOM 2002 • Active network of landmarks, pinging each other • Better math model (GNP) for interpolating distance than IDMaps • Proprietary Solutions (Akamai, etc) • Problem: What’s the closest server to an IP?
Clustering and Server Selection Using Passive Monitoring M. Andrews, B. Sheperd, A. Srinivasan, P. Winkler, F. Zane (Bell Labs), INFOCOM 2002
Problem: Server Selection • Given a client, and a set of servers, all with identical content, tell me the “best” server • For the client, “best” means lowest latency or highest throughput • For the whole system, it may mean something else
What is Passive Monitoring? • Content Servers don’t ping each other • Content Servers don’t ping clients • No pinging is done: no additional network traffic is introduced • Instead, servers record the Round-Trip Time of TCP handshake from a client
WEBMAPPER: Clusterer • Clusterer uses the client-server latency pairs reported by the servers • Determines which address prefixes correspond to the same network location, and which don’t • Does more than just find the closest server to each cluster; assigns probabilities to each to balance the network flow
WEBMAPPER: Big Tree • Giant Binary Tree of all IP addresses • Well, not all… assume last 8 bits always clustered • Root of tree: 0.0.0.0/0 • Children 0.0.0.0/1, 128.0.0.0/1 • Leaves 123.123.123.123/24
WEBMAPPER: Big Tree • Leaves also store • Sum of recorded distances (per server) • Squared sum of recorded distances • Number of recorded distances • Leaf data is periodically aged exponentially (I = I * 0.9)
WEBMAPPER: Small Tree • Clusters are formed from big tree by folding children into their parents when they’re “similar” to their siblings • Uses statistical test to determine “similar” (two-sample t-test) • Threshold based
Assigning Clusters to Servers Assigning servers to clusters is complicated • 1. Testing Index • Futzes with the probability of a cluster being assigned to a server • Based on multi-armed bandit solution
Assigning Clusters to Servers • Calculate server capacity • Each cluster assignment has a latency cost • Try to minimize the total cost without breaking any server’s capacity • Graph theory to the rescue: min-cost flow
Results • Experiment 1: • 28 day log of busy web traffic • Recorded client IP, time, RTT • Clustering produced 17,270 clusters • Here are 12 example clusters for high traffic days
Results • Experiment 2: • Set up a west-coast and east-coast server • Force clients to download something from each (to get actual measurments) • Have clients download something from the “either-or” server (WEBMAPPER powered)
Opinions • Statistics and weighing formulae are cool • What’s a good way to tell if the clustering is any good aside from eyeing samples? • Two server test is all we can get out of Bell Labs?