440 likes | 641 Views
Surviving Failures in Bandwidth-Constrained Datacenters. Peter Bodik , Ishai Menache @ Microsoft Pradeepkumar Mani, David A.Maltz @ Microsoft Research Mosharaf Chowdhury, Ion Stoica @ UC Berkeley. Presenter: Xin Li. How to allocate services to physical machine. Web Server.
E N D
Surviving Failures in Bandwidth-Constrained Datacenters Peter Bodik, IshaiMenache@ Microsoft Pradeepkumar Mani, David A.Maltz @ Microsoft Research Mosharaf Chowdhury, Ion Stoica @ UC Berkeley Presenter: Xin Li
How to allocate services to physical machine Web Server Database Server Indexing Server A Simplified Searching Engine Example Given network topology & VM allocation: - Is it optimal? - What is optimal?
What Matters Revenue User Experience smoothness availability Transparency
Availability Service A Service B Core Switch C A A Aggregation Switch Rack
Availability Service A Service B Core Switch C WCS—Worse Case Survival Smallest fraction of machines that remain functional during a single failure in datacenter. A A Aggregation Switch Rack FT—Fault Tolerance Average WCS across all services
Smoothness Service A 1000MB/s Service B 1000MB/s Core Switch C 1000*2+1000*2+ 1000*2+1000*2= 8000MB/s 6000MB/s A A Aggregation Switch 4000MB/s 1000*3+ 1000*3= 6000MB/s Rack Congestion
Oversubscription Ratio Upper Link Bandwidth(UB) B B B …………… Server n Server 2 Server 1 Oversubscription Ratio= B*n/UB
Smoothness Service A 1000MB/s Service B 1000MB/s Core Switch C 1000*2+1000*2+ 1000*2+1000*2= 8000MB/s 6000MB/s A A Aggregation Switch 4000MB/s 1000*3+ 1000*3= 6000MB/s Rack BW—Bandwidth Aggregate bandwidth usage on core links
Smoothness Service A 1000MB/s Service B 1000MB/s Core Switch C 6000MB/s 6000MB/s A A Aggregation Switch 4000MB/s Rack
Transparency • When we optimize the above two metrics, the cost should be low.
Transparency Service A Service B • When we optimize the above two metrics, the cost should be low. Core Switch C A A Aggregation Switch Rack
Transparency Service A Service B • When we optimize the above two metrics, the cost should be low. Core Switch C A A Aggregation Switch Rack
Transparency Service A Service B • When we optimize the above two metrics, the cost should be low. Core Switch C A A NM—Number of Moves # of server moves to reach target allocation Aggregation Switch Rack
What does this paper want to do----Optimize Datacenter Running Bing.com
Optimizing for one metric degrades the other GOAL! 160% allocations optimizing 120% only worst-case survival 80% 40% 0% initial allocations optimizing allocation -40% only core bandwidth -80% -20% 0% 20% 40% 60% 80% reduction in BW usage Results from 6 Microsoft datacenters 9
Motivation for combined solution Service communication matrix set of services cluster managerforming an application service (App,Service), (App,Service………. (App,Service) only 2% of service pairs communicate 1% of services generate 64% of traffic (lot more in the paper) (App,Service), (App,Service),………………. (App,Service) 15
Service A 1000MB/s Service B 20MB/s Core Switch C 6000MB/s A A Aggregation Switch 4000MB/s Rack
Problem Statement---the framework • Metrics: • Bandwidth (BW): The sum of the rates on the core links is the overall measure of the bandwidth usage at the core of network. • Fault Tolerance(FT): It is the average of Worst-Case-Survival(WCS) across all the services. • No. of Moves(NM): The number of servers that have to be re-imaged to get from initial datacenter allocation to the proposed allocation. • Optimization: Maximize FT – α BW Subject to NM ≤ N0 α – tunable positive parameter N0 – Upper limit on number of moves.
Metric #1: Fault Tolerance FT= fraction (%) of service available during single worst-case failure network core Switches (EoR/Agg) Containers Racks (ToR) power distribution Fault domain: space of all machines affected by a single (any) failure, * Fault domains are complex
Fault Tolerance • Cells – a subset of physical machines that belong to exactly the same fault domains. This allows reduction in the size of optimization problem. Power Supply Rack indicates the number of machine within cell n allocated to service k.
Formal Definitions • I – the indicator function • I(n1,n2) = 1 if traffic from n1 to n2 traverses through core link & I(n1,n2) = 0 otherwise. • Bandwidth is given by: Where is required BW between a pair of machines from services k1 and k2. • To define FT let be the total number of machines allocated to service k affected by fault j. FT is given by: • K – total no. of services.
But…. Maximize FT – α BW Subject to NM ≤ N0
Min-Cut 10 1 1 8 6 1 9 1 1 7
Optimizinng or BW only cut considered previously in [Meng et al., INFOCOM’10] network topology machine communication graph C A A + C k-wayn cut A A min cut k-way min graph cut: • ignores #M: reshuffles almost all machines • 99% migration chance - ignores FT: can’t be easily extended
BW k-way graph cut cut 43
FT algorithm Input: service map+VM, fault domain map 1. Calculate initial FTC of DC 2. For every possible swap: a. calculate new FTC after swap b. ∆FTC=FTC_old-FTC_new 3. Execute swap max(∆FTC) - symmetry => many “good” swaps exist Only evaluate a small, random set of swaps (~1000) 17
FT cut Non- solution: + Not good enough! 11
Conclusion Study of communication patterns of Bing.com - sparse communication matrix - very skewed communication pattern Principled optimization of both BW and FT - exploits communication patterns - can handle arbitrary fault domains Reduction in BW: 20 - 50% Improvement in FT: 40 - 120% 37