On the Stability of Network Distance Monitoring

On the Stability of Network Distance Monitoring Yan Chen, Chris Karlof, Yaping Li and Randy Katz {yanchen, ckarlof, yaping, randy}@CS.Berkeley.EDU EECS Department UC Berkeley

Introduction • Lots of applications/services may benefit from end-to-end distance monitoring/estimation • Mirror Selection - VPN management/provisioning • Overlay Routing/Location - Peer-to-peer file system • Cache-infrastructure Configuration • Service Redirection/Placement • Problem formulation: Given N end hosts that may belong to different administrative domains, how to select a subset of them to be probes and build an overlay distance monitoring service without knowing the underlying topology? • Solution: Internet Iso-bar • Cluster of hosts that perceive similar performance to Internet • For each cluster, select a monitor for active and continuous probing • The first one for monitoring site selection and stability evaluation with real Internet measurement data • Compare with other distance estimation services: ID Maps, GNP

Related Work • Existing Internet E2E distance estimation systems fall in two categories: • Clustering based (service-centric): IDMaps, Network Distance Map, Internet Iso-bar • Coordinate based (end host-centric): Triangulated schemes, GNP • Pioneering work: IDMaps • Clustering with IP address prefix (not very accurate) • Based on triangulation inequality • Number of hops only - No dynamics nor stability addressed • Network Distance Map • Clustering based on network proximity rather than similarity • Fixed monitors, no monitor placement/selection • GNP: • Each client maintains its own coordinate • Distance estimated through certain distance function over the coordinates

Framework of Internet Iso-bar • Define correlation distance between each pair of hosts • Apply generic clustering methods below • Limit the diameter (max distance between any two hosts in the cluster) of a cluster, and minimize number of clusters • Limit the number of clusters, then minimize the max diameter of all clusters • Choose the center of each cluster as monitor • Periodically monitors measure distance among each other as well as the distance to the hosts in its cluster • Inter-cluster distance estimation dist(i,j) = dist(monitori, monitorj) • Intra-cluster distance estimation (i,j has same monitor m) dist(i,j) = (dist(i, m) + dist(j, m) ) / 2 • Inter-cluster estimation dominates • Given K evenly distributed clusters, ratio of inter- vs. intra- estimation is K-1

Correlation Distance • Network distance based • Using proximity: dij = measured network distance(pij) • Using Euclidean distance ofnetwork distance vector: Vi = [pi1, pi2, …, pin]T • Using cosine vector similarity ofnetwork distance vector: • Geographical distance based • Using proximity

Properties Comparison N: # of hosts; K: # of monitors: AP: # of address prefix; D: # of dimensions I: # of iterations for optimization, proportional to # of variables, could be very large

Evaluation Methodology • Experiments with NLANR AMP data set • 119 sites on US (106 after filtering out most off sites) • Traceroute between every pair of hosts every minute • Clustering uses daily geometric mean of round-trip time (RTT) • Evaluation uses daily 1440 measurement of RTT • Raw data: • 6/24/00 – • 12/3/01

Performance & Stability Evaluation • Compare 6 distance estimation schemes (denotations) • Clustering with proximity of network distance (net_p) • Clustering with Euclidean dist of network dist vector (net_ed) • Clustering with vector similarity of network dist vector (net_vs) • Clustering with proximity of geographical distance (geo_p) • GNP - All schemes above have 15 clusters/landmarks • Omniscient: using the original pij to predict future pij (omni) • Stability analysis • Clustering / coordinates calculation with day1 (birth date) measurement • Compute relative predict error (rpe) using day2 (estimation date) measurement

Stability CDF of relative errors for 1-month (left) & 6-month (right) Summary of 80th (left) & 90th (right) percentile relative error

Conclusion • Omniscient always works the best • RTT time overall is quite stable for the experimental sites and period, but need further verification • It can not report timely congestion • It requires full n * n IP distance matrix, inapplicable to scalability tricks, e.g. hierarchy • GNP has better performance and stability than clustering-based schemes • Has much more computation & communication cost when update • Using similarity of network distance forclustering works much better than using proximity • Geographical proximity based clustering works better than network proximity based clustering • Requires no measurement for clustering & monitor selection • Provides reasonably good performance & stability • But may biased with the dataset used

Current Work • Congestion/Failure Correlation of Clustered Hosts • Can Monitors report timely congestion/path outage? False-alarms? • Evaluation with Keynote Web Site Perspective Benchmarking (Collaboration with Dr. Chris Overton@Keynote) • Measure Web site performance from more than 100 agents on the Internet • Heterogeneouscore network: various ISPs • Heterogeneousaccess network: • Dial up 56K, DSL and high-bandwidth business connections • Choose 40 most popular Web servers for benchmarking • Problem: how to reduce the number of agents and/or servers, but still represent the majority of end-user performance for reasonably stable period?

Keynote Agent Locations • America (including Canada, Mexico): 67 agents • 29 cities: Houston, Toronto, LA, Minneapolis, DC, Boston, Miami, Dallas, NY, SF, Cleveland, Philadelphia, Milwaukee, Chicago, Cincinnati, Portland, Vancouver, Seattle, Phoneix, San Diego, Denver, Sunnyvale, McLean, Atlanta, Tampa, St. Louis, Mexico, Kansas City, Pleasonton • 14 ISPs: PSI, Verio, UUNET, C&W, Sprint, Qwest, Genuity, AT&T, XO, Exodus, Level3, Intermedia, Avantel, SBC • Europe: 25 agents • 12 cities: London, Paris, Frankfurt, Munich, Oslo, Copenhagen, Amsterdam, Helsinki, Milan, Stockholm, Madrid, Brussels • 16 ISPs: PSI, Cerbernet, Oleane, Level3, ECRC, Nextra, UUNET, TeleDanmark, KPNQwest, Inet, DPN, Xlink, Telia, Retevision, BT, Telephonica • Asia: 8 agents • 6 cities: Seoul, Singapore, Tokyo, Shanghai, Hongkong, Taipei • 8 ISPs: BORANet, SingTel, IIJ, ChinaTel, HKT, Kornet, NTTCOM, HiNet, • Australia: 3 agents • 3 cities: Sydney, Wellington, Melbourne • 3 ISPs: OzeMail, Telstra-Saturn, Optus

Evaluation of Generic Clustering Algorithms • Limit-number clustering and limit-diameter clustering gives similar results with Limit-number a bit better • Net_ed and Net_vs gives similar results with Net_vs a bit better • Use Limit-number clustering for the rest comparison

Performance Evaluation • Static and stability analysis in daily, tri-daily, weekly, bi-weekly, monthly, six-monthly intervals

On the Stability of Network Distance Monitoring