Geographically Distributed Datacenters with Load Reallocation

Geographically Distributed Datacenterswith Load Reallocation Indra Widjaja, Sem Borst, Iraj Saniee Bell Labs DIMACS Workshop on Cloud Computing, December 8-9, 2011

Datacenter Alternatives 2 1 2 1 3 3 5 4 5 4 Geographically Centralized: Geographically Distributed: = Servers = Potential DC Site

Challenge Centralized datacenters cannot uniformly offer low-latency services to all end-users Distributed datacenters may not achieve elasticity

Toy Example of Distributed DC with Reallocation Without reallocation: With reallocation: λ1 λ1 q1,1 1 1 m1 m1 q1,3 3 3 2 2 5 5 4 4 • λi = job arrival rate at site i , mi = processing capacity at site i • qi,j = fraction of load reallocated from site i to site j

Formal Model of Load (Re)Allocationin Geographically Distributed Datacenter Let lik be arrival rate of type-k jobs at site i, bk service time of type-k job per server, and ti,j round-trip delay between sites i and j. The optimization problem to solve is: weighted average delay fraction of load at i sent to j st normalized exogenous arrival rate at i where total exogenous arrival rate at all sites total arrival rate at site j utilization at site j with Kj servers average processing delay with multiple-server approx.

Toy Example of Distributed DC with Reallocation 0.907 0 0.093 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0.093 0 0.907 Q = λ1=2 1 1 λ3=1 λ2=1.5 2 3 1 1 1 1 λ5=2 4 5 λ4=1.5 Weighted Delay = 0.7842

Large-Scale Topology 32-node, 44-link network used in the experiment: SEA SEA 11 SAI SAI 6 BUF BUF MIL MIL 2 2 ALB ALB DET DET 1 1 1 1 CLE CLE BOS BOS 1 2 5 2 1 CHI CHI 1 1 1 2 SPR SPR 1 NYC NYC 2 PIT PIT 3 PHI PHI 4 2 2 DEN DEN SAL SAL 1 BAL BAL KAN KAN 4 CIN CIN 1 2 SFO SFO 3 1 WAS WAS 4 LAS LAS NAS NAS 2 2 3 5 RAL RAL 1 3 2 PHO PHO 3 4 ATL ATL LOS LOS 1 ELP ELP 5 NOR NOR JAC JAC 1 HOU HOU 3 3 TAM TAM 2 MIA • Each link is associated with delay tij. • The centralized datacenter is located in CHI

Comparison of Delays 1.1l, if i is odd 0.9l, if i is even 1.5l, if i is odd 0.5l, if i is even li= li= Nearly-uniform job arrival rates: Non-uniform job arrival rates: mi =1 for all i

Comparison of Elasticities Moderate load variation: High load variation: In each trial, li=Uniform(0.25, 1) for moderate load variation for each i li=Uniform(0, 1.5) for high load variation for each i Then rescale li such that system-wide utilization is fixed (to 0.5). mi = 1 for each i

Multiple Job Types Type-independent: jobs are reallocated from i to j with qi,j fraction regardless of their types Type-dependent: type-k jobs are reallocated from i to j with qki,j Example with 2 job types:

Distributed Algorithms for Load Reallocation Basic idea: Each site icomputes impact on global objective function as it sends an additional small fraction of jobs to each site j, i.e., Min-rule: site i determines site jmin(i) such that ai,jmin(i) is the minimum derivative. It then reallocates loads from other sites to site jmin. Max-rule: site i determines site jmax(i) such that ai,jmax(i) is the maximum derivative. It then reallocates loads from site jmax to other sites.

Distributed Algorithm with “min-rule” At site i: Compute gi,j = ai,j - ai,jmin(i)for all j  Ni, compute gi= ∑jNi, j ≠jmin(i)gi,j, and d=min{k, (1-rjmin(i)) Kjmin(i)/(libgi)} where jmin(i) = argminjNiai,j At site i: Evaluate hi,j = min{qi,j, dgi,j}for all j ≠ jmin(i), jNi, and hi,jmin(i) = - ∑j≠jmin(i), jNihi,j At site i: Update qi,j= qi,j-hi,jfor all jNi, qi,j=0, for jNi Collect new measurement and go to next site (e.g., i=i+1 mod N) No Converged? Yes Detect changes in delay and utilization

Distributed Algorithm with “max-rule” At site i: Compute gi,j = max{ai,jmax(i) - ai,j, 0}for all jNi and compute nij = (1-rj) Kj/(lib), for allj ≠ jmax(i), j Ni, where jmax(i) = argmaxj:qi,j>0ai,j At site i: Compute d = min{k, qi,jmax(i)/ ∑jNigi,j} Evaluate hi,j = min{nij, dgi,j}for all j ≠ jmax(i), j Ni, and hi,jmax(i) = - ∑j≠jmax(i),jNihi,j At site i: Update qi,j= qi,j+ hi,jfor all jNi, qi,j=0, for jNi Collect new measurement and go to next site (e.g., i=i+1 mod N) No Converged? Yes Detect changes in delay and utilization

Scenario 1: Load Increases by 50% at One Site

Scenario 4: Two Back-to-Back Overloaded Sites

Scenario 5: Noisy versus Perfect Measurements

Conclusions and Further Work Load reallocation provides key instrument for achieving elasticity and reducing latency simultaneously Only considered processing-intensive applications so far; other applications will be considered in further work

Geographically Distributed Datacenters with Load Reallocation

Geographically Distributed Datacenters with Load Reallocation

Presentation Transcript

Geographically Distributed Acoustical Monitoring of Migrating Birds

Load Balancing in Distributed Systems

Distributed Load Algorithms

Thinking Geographically

SLA-aware load balancing for cloud datacenters

An SLA-aware load balancing scheme for cloud datacenters

Thinking Geographically

Modular Datacenters

An SLA-aware load balancing scheme for cloud datacenters

SLA-aware load balancing for cloud datacenters

Geographically Speaking

Thinking Geographically

Fast Hybrid Simulation with Geographically Distributed Substructures

Geographically Distributed Transactional Applications

AstroGrid Datacenters

Boston University CS 633 Geographically Distributed Development

CONGA: Distributed Congestion-Aware Load Balancing for Datacenters

Geographically Distributed Transactional Applications

Thinking Geographically

Thinking Geographically

QoS-Aware Service Selection in Geographically Distributed Clouds

Load Balancing in Distributed Systems