On the Placement of Web Server Replicas

On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001, Anchorage, AK, April 2001

Outline • Overview • Related work • Our approach • Simulation methodology & results • Summary

Motivation • Growing interests in Web server replicas • Exponential growth in Web usage • Content providers want to offer better service at lower cost • Solution: replication • Forms of Web server replicas • Mirror sites • Content Distribution Networks (CDNs) • CDN: a network of servers • Examples: Akamai, Digital Island Internet replica replica replica replica replica Content Providers Clients

Placement of Web Server Replicas • Problem specification • Among a set of N potential sites, pick K sites as replicas to minimize users’ latency or bandwidth usage Internet Content Providers Clients

Related Work • Placement of Web proxies [LGI+99] • Cache location [KRS00] • Placement of Internet instrumentation [JJJ+00]

Our Approach • Model Internet as a graph • Parameterize the graph using measured inputs • # requests generated from each region • Distance between different regions • Map the placement problem onto a graph optimization problem • Assumption: • Each client uses a single replica that is closest to it • Solve graph optimization problem • Using various approximation algorithms

Minimum K-median Problem • Given a complete graph G=(V,E), d(j), c(i,j) • d(j): # requests • c(i,j): distance between node i and j • Latency • or hop counts • or other metric to be optimized • Find a subset V’ V with |V’| = K s.t. it minimizes vVminwV’d(v)c(v,w) • NP-hard problem 8 7 4 5 3 2 2 2 4 8 6 3 5 10 6

Placement Algorithms • Tree based algorithm [LGG+99] • Assume the underlying topologies are trees, and model it as a dynamic programming problem • O(N3M2) for choosing M replicas among N potential places • Random • Pick the best among several random assignments • Hot spot • Place replicas near the clients that generate the largest load

Placement Algorithms (Cont.) • Greedy algorithm • Calculate costs of assigning clients to replicas • Select replica with lowest cost • Adjust costs based upon assignment, repeat until done • Super-Optimal algorithm • Lagrangian relaxation + subgradient method

Simulation Methodology • Network topology • Randomly generated topologies • Using GT-ITM Internet topology generator • Real Internet network topology • AS level topology obtained using BGP routing data from a set of seven geographically dispersed BGP peers • Web Workload • Real server traces • MSNBC, ClarkNet, NASA Kennedy Space Center • Performance Metric • Relative performance: costpractical/costsuper-optimal

Simulation Methodology (Cont.) • Simulate a network of N nodes (100  N  3000) • Cluster clients using network aware clustering [KW00] • IP addresses with the same address prefix belong to a cluster • A small number of popular clusters account for most requests • Top 10, 100, 1000, 3000 clusters account for about 24%, 45%, 78%, and 94% of the requests respectively • Pick the top N clusters • Map them to different nodes

Simulation Methodology (Cont.) • Random trees • Random graphs • AS-level topologies • Sensitivity to the error in the input

Random Tree Topologies Tree-based algorithm performs well as expected. Greedy algorithm performs equally as well.

Random Graph Topologies The greedy and hot-spot algorithms out-perform the tree-based algorithm.

Large Random Graph Topologies The greedy performs the best, and the hot-spot performs nearly as well.

AS-level Internet Topologies The greedy performs the best, and the hot-spot performs nearly as well.

Effects of Imperfect Knowledge about Input Data • Predicted workload (using moving window average) • Perfect topology information Within 5% degradation when using predicted workload

Effects of Imperfect Knowledge about Input Data (Cont.) • Predicted workload (using moving window average) • Noisy topology information • Perturb the distance between two nodes i and j by up to a factor of 2 Within 15% degradation when using predicted workload and noisy topology information

Summary • One of the first experimental studies on placement of Web server replicas • Knowledge about client workload and topology is needed for provisioning replicas • The greedy algorithm performs very well • Within a factor of 1.1 – 1.5 of the super-optimal • Insensitive to noise • Stay within a factor of 2 of the super-optimal when the salted error is a factor of 4 • The hot spot algorithm performs nearly as well • Within a factor of 1.6 – 2 of the super-optimal • Obtaining input data • Moving window average for load prediction • Using BGP router data to obtain topology information

Conclusion • Recommend using the greedy algorithm for deciding the placement of Web server replicas

Acknowledgement • Craig Labovitz • Yin Zhang • Ravi Kumar

On the Placement of Web Server Replicas

On the Placement of Web Server Replicas

Presentation Transcript

Web Server

Death of a Web Server

On the Optimal Placement of Mix Zones

Decoupled Storage: “Free the Replicas!”

Studying the Impact of More Complete Server Information on Web Caching

Advanced Web Server/HTTP Server

Analysis of web server logs

Web server

On the Effect of Server Adaptation for Web Content Delivery

Placement of Web-Server Proxies with Consideration of Read and Update Operations on the Internet

Measuring the Capacity of a Web Server

Measuring The Capacity of a Web server

WEB SERVER

TclHttpd The Tcl Web Server

Optimal Placement of Replicas in Trees with Read, Write, and Storage Costs

Bodily Web Server vs. Digital Web Server