560 likes | 812 Views
Network Sharing. The story thus far: Sharing. Omega + Mesos How to share End-host resource Think CPU, Memory, I/O Different ways to share: Fair sharing: Idealist view. Everyone should get equal access Proportional sharing: Ideal for public cloud
E N D
The story thus far: Sharing • Omega + Mesos • How to share End-host resource • Think CPU, Memory, I/O • Different ways to share: • Fair sharing: Idealist view. • Everyone should get equal access • Proportional sharing: Ideal for public cloud • Get access to an amount equal to how much you pay • Priority-Deadline based sharing: Ideal for private data center. • Company care about completion times. • What about the network? • Isn’t this important?
Network Caring is Network Sharing • Network is import to a jobs completion time. • Default network sharing is TCP • Vague notion of fair sharing • Fairness is based on individual flows • Work-conserving • Per-Flow based sharing is biased • VMs with many flows get a greater share of the network
What is the best form of Network Sharing • Fair sharing: • Per-Source based fairness? • Reducers cheats– many flows to one destination. • Per-Destination based fairness? • Map can cheat • Fairness === Bad: • No one can predict anything. • And we like things to be prediction: we like short and predictable latency • Min-Bandwidth Guarantees • Perfect!! But: • Implementation can lead to inefficiency • How do you predict bandwidth demands
What is the best form of Network Sharing • Fair sharing: • Per-Source based fairness? • Reducers cheats– many flows to one destination. • Per-Destination based fairness? • Map can cheat • Fairness === Bad: • No one can predict anything. • And we like things to be prediction: we like short and predictable latency • Min-Bandwidth Guarantees • Perfect!! But: • Implementation can lead to inefficiency • How do you predict bandwidth demands
What is the best form of Network Sharing • Fair sharing: • Per-Source based fairness? • Reducers cheats– many flows to one destination. • Per-Destination based fairness? • Map can cheat • Fairness === Bad: • No one can predict anything. • And we like things to be prediction: we like short and predictable latency • Min-Bandwidth Guarantees • Perfect!! But: • Implementation can lead to inefficiency • How do you predict bandwidth demands
How can you share the network? • Endhost sharing schemes • Use default TCP? Never! • Change the hypervisor! • Requires virtualization • Change the endhost’s TCP stack • Invasive and undesirably • In-Network sharing schemes • Use queues and rate-limiters • Limited enforcing to 7-8 different guarantees • Utilize ECN • Requires switches that support ECN mechanism • Other switch modifications • Expensive and highly unlikely except maybe OpenFlow.
ElasticSwitch: Practical Work-Conserving Bandwidth Guarantees for Cloud Computing Lucian Popa Praveen Yalagandula*Sujata Banerjee Jeffrey C. Mogul+Yoshio Turner Jose Renato Santos HP Labs *Avi Networks +Google
Goals • Provide Minimum Bandwidth Guarantees in Clouds • Tenants can affect each other’s traffic • MapReducejobs can affect performance of user-facing applications • Large MapReduce jobs can delay the completion of small jobs • Bandwidth guarantees offer predictable performance
Goals • Provide Minimum Bandwidth Guarantees in Clouds VS BX BZ BY Hose model Z X Y Virtual (imaginary) Switch Bandwidth Guarantees VMs of one tenant Other models based on hose model such as TAG [HotCloud’13]
Goals • Provide Minimum Bandwidth Guarantees in Clouds • Work-Conserving Allocation • Tenants can use spare bandwidth from unallocated or underutilized guarantees
Goals • Provide Minimum Bandwidth Guarantees in Clouds • Work-Conserving Allocation • Tenants can use spare bandwidth from unallocated or underutilized guarantees • Significantly increases performance • Average traffic is low [IMC09,IMC10] • Traffic is bursty
Goals • Provide Minimum Bandwidth Guarantees in Clouds • Work-Conserving Allocation Everything reserved & used ElasticSwitch Free capacity XY bandwidth Bmin Bmin Bmin X Y Time
Goals • Provide Minimum Bandwidth Guarantees in Clouds • Work-Conserving Allocation • Be Practical • Topology independent: work with oversubscribed topologies • Inexpensive: per VM/per tenant queues are expensive work with commodity switches • Scalable: centralized controller can be bottleneck distributed solution • Hard to partition: VMs can cause bottlenecks anywhere in the network
Goals • Provide Minimum Bandwidth Guarantees in Clouds • Work-Conserving Allocation • Be Practical
Outline • Motivation and Goals • Overview • More Details • Guarantee Partitioning • Rate Allocation • Evaluation
ElasticSwitch Overview: Operates At Runtime Tenant selects bandwidth guarantees. Models: Hose, TAG, etc. VMs placed, Admission Control ensures all guarantees can be met Oktopus[SIGCOMM’10] Hadrian[NSDI’10] CloudMirror[HotCLoud’13] VM setup Enforce bandwidth guarantees & Provide work-conservation Runtime ElasticSwitch
ElasticSwitch Overview: Runs In Hypervisors • Resides in the hypervisor of each host • Distributed: Communicates pairwise following data flows VM VM VM ElasticSwitch Hypervisor Network VM VM VM ElasticSwitch ElasticSwitch Hypervisor Hypervisor
ElasticSwitch Overview: Two Layers Guarantee Partitioning Provides Guarantees Rate Allocation Provides Work-conservation Hypervisor
ElasticSwitch Overview: Guarantee Partitioning 1. Guarantee Partitioning: turns hose model into VM-to-VM pipe guarantees VM-to-VM control is necessary, coarser granularity is not enough
ElasticSwitch Overview: Guarantee Partitioning 1. Guarantee Partitioning: turns hose model into VM-to-VM pipe guarantees VS Intra-tenant BX BZ BY Z X Y BXY BXZ VM-to-VM guarantees bandwidths as if tenant communicates on a physical hose network
ElasticSwitch Overview: Rate Allocation 1. Guarantee Partitioning: turns hose model into VM-to-VM pipe guarantees 2. Rate Allocation: uses rate limiters, increases rate between X-Y above BXY when there is no congestion between X and Y Work-conserving allocation VS Inter-tenant BX BZ BY Z X Y BXY RateXY ≥ X Y Limiter Hypervisor Hypervisor Unreserved/Unused Capacity
ElasticSwitch Overview: Periodic Application Guarantee Partitioning Applied periodically and on new VM-to-VM pairs VM-to-VM guarantees Demand estimates Rate Allocation Applied periodically, more often Hypervisor
Outline • Motivation and Goals • Overview • More Details • Guarantee Partitioning • Rate Allocation • Evaluation
Guarantee Partitioning – Overview VS1 BX BQ Z X Y T Q Z T BXZ BTY BXY X Y Max-min allocation BQY • Goals: • Safety – don’t violate hose model • Efficiency – don’t waste guarantee • No Starvation – don’t block traffic Q
Guarantee Partitioning – Overview VS1 BX BQ Z X Y T Q Z T BX = … = BQ= 100Mbps 33Mbps 66Mbps 33Mbps X Y Max-min allocation 33Mbps • Goals: • Safety – don’t violate hose model • Efficiency – don’t waste guarantee • No Starvation – don’t block traffic Q
Guarantee Partitioning – Operation VS1 BX BQ Z X Y T Q Z T TY BY XZ XY XY BX BXY = min(BX, BY ) X Y X Y XY XY BX BY QY BY Hypervisor divides guarantee of each hosted VM between VM-to-VM pairs in each direction Q Source hypervisor uses the minimum between the source and destination guarantees
Guarantee Partitioning – Safety VS1 BX BQ Z X Y T Q Z T TY BY XZ XY XY BX BXY = min(BX, BY ) X Y X Y XY XY BX BY QY BY Q Safety: hose-model guarantees are not exceeded
Guarantee Partitioning – Operation VS1 BX BQ Z X Y T Q Z T BX = … = BQ= 100Mbps XY XY BXY = min(BX, BY ) X Y Q
Guarantee Partitioning – Operation VS1 BX BQ Z X Y T Q Z T BX = … = BQ= 100Mbps TY XY XY XZ BY = 33 BXY = min(BX, BY) BX = 50 BXY = 33 X Y X Y XY XY BY = 33 BX = 50 QY BY = 33 Q
Guarantee Partitioning – Efficiency VS1 BX BQ Z X Y T Q Z T BX = … = BQ= 100Mbps TY XZ BY = 33 BX = 50 BXY = 33 X Y XY XY BY = 33 BX = 50 QY BY = 33 1 Q What happens when flows have low demands? Hypervisor divides guarantees max-min based on demands (future demands estimated based on history)
Guarantee Partitioning – Efficiency VS1 BX BQ Z X Y T Q Z T BX = … = BQ= 100Mbps 66 TY XZ BY = 33 BX = 50 BXY = 33 X Y XY XY BY = 33 BX = 50 33 QY BY = 33 1 Q What happens when flows have low demands? 2 How to avoid unallocated guarantees?
Guarantee Partitioning – Efficiency VS1 BX BQ Z X Y T Q Z T BX = … = BQ= 100Mbps 66 TY XZ BY = 33 BX = 50 BXY = 33 X Y XY XY BY = 33 BX = 50 33 QY BY = 33 Source considers destination’s allocation when destination is bottleneck Q Guarantee Partitioning converges
Outline • Motivation and Goals • Overview • More Details • Guarantee Partitioning • Rate Allocation • Evaluation
Rate Allocation RXY Spare bandwidth Fully used X BXY Guarantee Partitioning BXY Time Congestion data Rate Allocation Rate RXY Limiter Y
Rate Allocation RXY= max(BXY, RTCP-like) X Guarantee Partitioning BXY Congestion data Rate Allocation Rate RXY Limiter Y
Rate Allocation RXY= max(BXY, RTCP-like) Another Tenant Guarantee
Rate Allocation RXY= max(BXY, Rweighted-TCP-like) Weight is the BXY guarantee L = 1Gbps RXY = 333Mbps X Y BXY = 100Mbps RXT = 666Mbps Z T BZT = 200Mbps
Rate Allocation – Congestion Detection • Detect congestion through dropped packets • Hypervisors add/monitor sequence numbers in packet headers • Use ECN, if available
Rate Allocation – Adaptive Algorithm • Use Seawall [NSDI’11] as rate-allocation algorithm • TCP-Cubic like • Essential improvements (for when using dropped packets) Many flows probing for spare bandwidth affect guarantees of others
Rate Allocation – Adaptive Algorithm • Use Seawall [NSDI’11] as rate-allocation algorithm • TCP-Cubic like • Essential improvements (for when using dropped packets) • Hold-increase: hold probing for free bandwidth after a congestion event. Holding time is inversely proportional to guarantee. Rate increasing Guarantee Holding time
Outline • Motivation and Goals • Overview • More Details • Guarantee Partitioning • Rate Allocation • Evaluation
Evaluation – MapReduce • Setup • 44 servers, 4x oversubscribed topology, 4 VMs/server • Each tenant runs one job, all VMs of all tenants same guarantee • Two scenarios: • Light • 10% of VM slots are either a mapper or a reducer • Randomly placed • Heavy • 100% of VM slots are either a mapper or a reducer • Mappers are placed in one half of the datacenter
Evaluation – MapReduce CDF Worst case shuffle completion time / static reservation
Evaluation – MapReduce No Protection ElasticSwitch Longest completion reduced from No Protection Work-conserving pays off: finish faster than static reservation CDF Light Setup Worst case shuffle completion time / static reservation
Evaluation – MapReduce ElasticSwitch enforces guarantees in worst case ElasticSwitch No Protection up to 160X CDF Guarantees are useful in reducing worst-case shuffle completion Heavy Setup Worst case shuffle completion time / static reservation
ElasticSwitch Summary • Properties • Bandwidth Guarantees: hose model or derivatives • Work-conserving • Practical: oversubscribed topologies, commodity switches, decentralized • Design: two layers • Guarantee Partitioning: provides guarantees by transforming hose-model guarantees into VM-to-VM guarantees • Rate Allocation: enables work conservation by increasing rate limits above guarantees when no congestion HP Labs is hiring!
Future Work • Reduce Overhead • ElasticSwitch: average 1 core / 15 VMs ,worst case 1 core /VM • Multi-path solution • Single-path reservations are inefficient • No existing solution works on multi-path networks • VM placement • Placing VMs in different locations impacts the gaurantees that can be made.
Open Questions • How do you integrate network sharing with endhost sharing. • What are the implications of different sharing mechanisms with each other? • How does the network architecture affect network sharing? • How do you do admission control? • How do you detect demand? • How does payment fit into this question? And if it does, when VMs from different people communicate, who dictates price, who gets charged?