ElasticSwitch : Practical Work-Conserving Bandwidth Guarantees for Cloud Computing

ElasticSwitch: Practical Work-Conserving Bandwidth Guarantees for Cloud Computing Lucian Popa Praveen Yalagandula*Sujata Banerjee Jeffrey C. Mogul+Yoshio Turner Jose Renato Santos HP Labs *Avi Networks +Google

Goals • Provide Minimum Bandwidth Guarantees in Clouds • Tenants can affect each other’s traffic • MapReducejobs can affect performance of user-facing applications • Large MapReduce jobs can delay the completion of small jobs • Bandwidth guarantees offer predictable performance

Goals • Provide Minimum Bandwidth Guarantees in Clouds VS BX BZ BY Hose model Z X Y Virtual (imaginary) Switch Bandwidth Guarantees VMs of one tenant Other models based on hose model such as TAG [HotCloud’13]

Goals • Provide Minimum Bandwidth Guarantees in Clouds • Work-Conserving Allocation • Tenants can use spare bandwidth from unallocated or underutilized guarantees

Goals • Provide Minimum Bandwidth Guarantees in Clouds • Work-Conserving Allocation • Tenants can use spare bandwidth from unallocated or underutilized guarantees • Significantly increases performance • Average traffic is low [IMC09,IMC10] • Traffic is bursty

Goals • Provide Minimum Bandwidth Guarantees in Clouds • Work-Conserving Allocation Everything reserved & used ElasticSwitch Free capacity XY bandwidth Bmin Bmin Bmin X Y Time

Goals • Provide Minimum Bandwidth Guarantees in Clouds • Work-Conserving Allocation • Be Practical • Topology independent: work with oversubscribed topologies • Inexpensive: per VM/per tenant queues are expensive  work with commodity switches • Scalable: centralized controller can be bottleneck  distributed solution • Hard to partition: VMs can cause bottlenecks anywhere in the network

Goals • Provide Minimum Bandwidth Guarantees in Clouds • Work-Conserving Allocation • Be Practical

Prior Work

Outline • Motivation and Goals • Overview • More Details • Guarantee Partitioning • Rate Allocation • Evaluation

ElasticSwitch Overview: Operates At Runtime Tenant selects bandwidth guarantees. Models: Hose, TAG, etc. VMs placed, Admission Control ensures all guarantees can be met Oktopus[SIGCOMM’10] Hadrian[NSDI’10] CloudMirror[HotCLoud’13] VM setup Enforce bandwidth guarantees & Provide work-conservation Runtime ElasticSwitch

ElasticSwitch Overview: Runs In Hypervisors • Resides in the hypervisor of each host • Distributed: Communicates pairwise following data flows VM VM VM ElasticSwitch Hypervisor Network VM VM VM ElasticSwitch ElasticSwitch Hypervisor Hypervisor

ElasticSwitch Overview: Two Layers Guarantee Partitioning Provides Guarantees Rate Allocation Provides Work-conservation Hypervisor

ElasticSwitch Overview: Guarantee Partitioning 1. Guarantee Partitioning: turns hose model into VM-to-VM pipe guarantees VM-to-VM control is necessary, coarser granularity is not enough

ElasticSwitch Overview: Guarantee Partitioning 1. Guarantee Partitioning: turns hose model into VM-to-VM pipe guarantees VS Intra-tenant BX BZ BY Z X Y BXY BXZ VM-to-VM guarantees  bandwidths as if tenant communicates on a physical hose network

ElasticSwitch Overview: Rate Allocation 1. Guarantee Partitioning: turns hose model into VM-to-VM pipe guarantees 2. Rate Allocation: uses rate limiters, increases rate between X-Y above BXY when there is no congestion between X and Y Work-conserving allocation VS Inter-tenant BX BZ BY Z X Y BXY RateXY ≥ X Y Limiter Hypervisor Hypervisor Unreserved/Unused Capacity

ElasticSwitch Overview: Periodic Application Guarantee Partitioning Applied periodically and on new VM-to-VM pairs VM-to-VM guarantees Demand estimates Rate Allocation Applied periodically, more often Hypervisor

Guarantee Partitioning – Overview VS1 BX BQ Z X Y T Q Z T BXZ BTY BXY X Y Max-min allocation BQY • Goals: • Safety – don’t violate hose model • Efficiency – don’t waste guarantee • No Starvation – don’t block traffic Q

Guarantee Partitioning – Overview VS1 BX BQ Z X Y T Q Z T BX = … = BQ= 100Mbps 33Mbps 66Mbps 33Mbps X Y Max-min allocation 33Mbps • Goals: • Safety – don’t violate hose model • Efficiency – don’t waste guarantee • No Starvation – don’t block traffic Q

Guarantee Partitioning – Operation VS1 BX BQ Z X Y T Q Z T TY BY XZ XY XY BX BXY = min(BX, BY ) X Y X Y XY XY BX BY QY BY Hypervisor divides guarantee of each hosted VM between VM-to-VM pairs in each direction Q Source hypervisor uses the minimum between the source and destination guarantees

Guarantee Partitioning – Safety VS1 BX BQ Z X Y T Q Z T TY BY XZ XY XY BX BXY = min(BX, BY ) X Y X Y XY XY BX BY QY BY Q Safety: hose-model guarantees are not exceeded

Guarantee Partitioning – Operation VS1 BX BQ Z X Y T Q Z T BX = … = BQ= 100Mbps XY XY BXY = min(BX, BY ) X Y Q

Guarantee Partitioning – Operation VS1 BX BQ Z X Y T Q Z T BX = … = BQ= 100Mbps TY XY XY XZ BY = 33 BXY = min(BX, BY) BX = 50 BXY = 33 X Y X Y XY XY BY = 33 BX = 50 QY BY = 33 Q

Guarantee Partitioning – Efficiency VS1 BX BQ Z X Y T Q Z T BX = … = BQ= 100Mbps TY XZ BY = 33 BX = 50 BXY = 33 X Y XY XY BY = 33 BX = 50 QY BY = 33 1 Q What happens when flows have low demands? Hypervisor divides guarantees max-min based on demands (future demands estimated based on history)

Guarantee Partitioning – Efficiency VS1 BX BQ Z X Y T Q Z T BX = … = BQ= 100Mbps 66 TY XZ BY = 33 BX = 50 BXY = 33 X Y XY XY BY = 33 BX = 50 33 QY BY = 33 1 Q What happens when flows have low demands? 2 How to avoid unallocated guarantees?

Guarantee Partitioning – Efficiency VS1 BX BQ Z X Y T Q Z T BX = … = BQ= 100Mbps 66 TY XZ BY = 33 BX = 50 BXY = 33 X Y XY XY BY = 33 BX = 50 33 QY BY = 33 Source considers destination’s allocation when destination is bottleneck Q Guarantee Partitioning converges

Rate Allocation RXY Spare bandwidth Fully used X BXY Guarantee Partitioning BXY Time Congestion data Rate Allocation Rate RXY Limiter Y

Rate Allocation RXY= max(BXY, RTCP-like) X Guarantee Partitioning BXY Congestion data Rate Allocation Rate RXY Limiter Y

Rate Allocation RXY= max(BXY, RTCP-like) Another Tenant Guarantee

Rate Allocation RXY= max(BXY, Rweighted-TCP-like) Weight is the BXY guarantee L = 1Gbps RXY = 333Mbps X Y BXY = 100Mbps RXT = 666Mbps Z T BZT = 200Mbps

Rate Allocation – Congestion Detection • Detect congestion through dropped packets • Hypervisors add/monitor sequence numbers in packet headers • Use ECN, if available

Rate Allocation – Adaptive Algorithm • Use Seawall [NSDI’11] as rate-allocation algorithm • TCP-Cubic like • Essential improvements (for when using dropped packets) Many flows probing for spare bandwidth affect guarantees of others

Rate Allocation – Adaptive Algorithm • Use Seawall [NSDI’11] as rate-allocation algorithm • TCP-Cubic like • Essential improvements (for when using dropped packets) • Hold-increase: hold probing for free bandwidth after a congestion event. Holding time is inversely proportional to guarantee. Rate increasing Guarantee Holding time

Evaluation Setup • Implementation in Linux • Logic in user-space: controls rate limiters, sends control packets • Modified kernel OVS • Testbed • ~100 servers • 1Gbps tree network

Evaluation – Many-to-one VS1 TCP 450Mbps L = 1Gbps X X VS2 450Mbps Z Z … UDP Edge or core

Evaluation – Many-to-one X Z Throughput (Mbps) Senders to Z

Evaluation – Many-to-one X Z No Protection VM Z takes all the bandwidth Throughput (Mbps) … Senders to Z

Evaluation – Many-to-one X Z No Protection Static Reservation (e.g., Oktopus) Wasted bandwidth Throughput (Mbps) Senders to Z

Evaluation – Many-to-one X Z No Protection Static Reservation (e.g., Oktopus) ElasticSwitch Work-conserving Throughput (Mbps) Senders to Z

Evaluation – Many-to-one X Z No Protection Provides guarantees Static Reservation (e.g., Oktopus) ElasticSwitch Throughput (Mbps) Senders to Z

Evaluation – Many-to-one X Z ElasticSwitch Ideal behavior Throughput (Mbps) Senders to Z

Evaluation – MapReduce • Setup • 44 servers, 4x oversubscribed topology, 4 VMs/server • Each tenant runs one job, all VMs of all tenants same guarantee • Two scenarios: • Light • 10% of VM slots are either a mapper or a reducer • Randomly placed • Heavy • 100% of VM slots are either a mapper or a reducer • Mappers are placed in one half of the datacenter

Evaluation – MapReduce CDF Worst case shuffle completion time / static reservation

Evaluation – MapReduce No Protection ElasticSwitch Longest completion reduced from No Protection Work-conserving pays off: finish faster than static reservation CDF Light Setup Worst case shuffle completion time / static reservation

Evaluation – MapReduce ElasticSwitch enforces guarantees in worst case ElasticSwitch No Protection up to 160X CDF Guarantees are useful in reducing worst-case shuffle completion Heavy Setup Worst case shuffle completion time / static reservation

ElasticSwitch Summary • Properties • Bandwidth Guarantees: hose model or derivatives • Work-conserving • Practical: oversubscribed topologies, commodity switches, decentralized • Design: two layers • Guarantee Partitioning: provides guarantees by transforming hose-model guarantees into VM-to-VM guarantees • Rate Allocation: enables work conservation by increasing rate limits above guarantees when no congestion HP Labs is hiring!

Backup Slides

ElasticSwitch : Practical Work-Conserving Bandwidth Guarantees for Cloud Computing