210 likes | 296 Views
On the Placement of Web Server Replicas. Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001, Anchorage, AK, April 2001. Outline. Overview Related work Our approach Simulation methodology & results Summary. Motivation.
E N D
On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001, Anchorage, AK, April 2001
Outline • Overview • Related work • Our approach • Simulation methodology & results • Summary
Motivation • Growing interests in Web server replicas • Exponential growth in Web usage • Content providers want to offer better service at lower cost • Solution: replication • Forms of Web server replicas • Mirror sites • Content Distribution Networks (CDNs) • CDN: a network of servers • Examples: Akamai, Digital Island Internet replica replica replica replica replica Content Providers Clients
Placement of Web Server Replicas • Problem specification • Among a set of N potential sites, pick K sites as replicas to minimize users’ latency or bandwidth usage Internet Content Providers Clients
Related Work • Placement of Web proxies [LGI+99] • Cache location [KRS00] • Placement of Internet instrumentation [JJJ+00]
Our Approach • Model Internet as a graph • Parameterize the graph using measured inputs • # requests generated from each region • Distance between different regions • Map the placement problem onto a graph optimization problem • Assumption: • Each client uses a single replica that is closest to it • Solve graph optimization problem • Using various approximation algorithms
Minimum K-median Problem • Given a complete graph G=(V,E), d(j), c(i,j) • d(j): # requests • c(i,j): distance between node i and j • Latency • or hop counts • or other metric to be optimized • Find a subset V’ V with |V’| = K s.t. it minimizes vVminwV’d(v)c(v,w) • NP-hard problem 8 7 4 5 3 2 2 2 4 8 6 3 5 10 6
Placement Algorithms • Tree based algorithm [LGG+99] • Assume the underlying topologies are trees, and model it as a dynamic programming problem • O(N3M2) for choosing M replicas among N potential places • Random • Pick the best among several random assignments • Hot spot • Place replicas near the clients that generate the largest load
Placement Algorithms (Cont.) • Greedy algorithm • Calculate costs of assigning clients to replicas • Select replica with lowest cost • Adjust costs based upon assignment, repeat until done • Super-Optimal algorithm • Lagrangian relaxation + subgradient method
Simulation Methodology • Network topology • Randomly generated topologies • Using GT-ITM Internet topology generator • Real Internet network topology • AS level topology obtained using BGP routing data from a set of seven geographically dispersed BGP peers • Web Workload • Real server traces • MSNBC, ClarkNet, NASA Kennedy Space Center • Performance Metric • Relative performance: costpractical/costsuper-optimal
Simulation Methodology (Cont.) • Simulate a network of N nodes (100 N 3000) • Cluster clients using network aware clustering [KW00] • IP addresses with the same address prefix belong to a cluster • A small number of popular clusters account for most requests • Top 10, 100, 1000, 3000 clusters account for about 24%, 45%, 78%, and 94% of the requests respectively • Pick the top N clusters • Map them to different nodes
Simulation Methodology (Cont.) • Random trees • Random graphs • AS-level topologies • Sensitivity to the error in the input
Random Tree Topologies Tree-based algorithm performs well as expected. Greedy algorithm performs equally as well.
Random Graph Topologies The greedy and hot-spot algorithms out-perform the tree-based algorithm.
Large Random Graph Topologies The greedy performs the best, and the hot-spot performs nearly as well.
AS-level Internet Topologies The greedy performs the best, and the hot-spot performs nearly as well.
Effects of Imperfect Knowledge about Input Data • Predicted workload (using moving window average) • Perfect topology information Within 5% degradation when using predicted workload
Effects of Imperfect Knowledge about Input Data (Cont.) • Predicted workload (using moving window average) • Noisy topology information • Perturb the distance between two nodes i and j by up to a factor of 2 Within 15% degradation when using predicted workload and noisy topology information
Summary • One of the first experimental studies on placement of Web server replicas • Knowledge about client workload and topology is needed for provisioning replicas • The greedy algorithm performs very well • Within a factor of 1.1 – 1.5 of the super-optimal • Insensitive to noise • Stay within a factor of 2 of the super-optimal when the salted error is a factor of 4 • The hot spot algorithm performs nearly as well • Within a factor of 1.6 – 2 of the super-optimal • Obtaining input data • Moving window average for load prediction • Using BGP router data to obtain topology information
Conclusion • Recommend using the greedy algorithm for deciding the placement of Web server replicas
Acknowledgement • Craig Labovitz • Yin Zhang • Ravi Kumar