1 / 42

Surviving Failures in Bandwidth-Constrained Datacenters

Surviving Failures in Bandwidth-Constrained Datacenters. Peter Bodik , Ishai Menache @ Microsoft Pradeepkumar Mani, David A.Maltz @ Microsoft Research Mosharaf Chowdhury, Ion Stoica @ UC Berkeley. Presenter: Xin Li. How to allocate services to physical machine. Web Server.

elaina
Download Presentation

Surviving Failures in Bandwidth-Constrained Datacenters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Surviving Failures in Bandwidth-Constrained Datacenters Peter Bodik, IshaiMenache@ Microsoft Pradeepkumar Mani, David A.Maltz @ Microsoft Research Mosharaf Chowdhury, Ion Stoica @ UC Berkeley Presenter: Xin Li

  2. How to allocate services to physical machine Web Server Database Server Indexing Server A Simplified Searching Engine Example Given network topology & VM allocation: - Is it optimal? - What is optimal?

  3. What Matters Revenue User Experience smoothness availability Transparency

  4. Availability Service A Service B Core Switch C A A Aggregation Switch Rack

  5. Availability Service A Service B Core Switch C WCS—Worse Case Survival Smallest fraction of machines that remain functional during a single failure in datacenter. A A Aggregation Switch Rack FT—Fault Tolerance Average WCS across all services

  6. Smoothness Service A 1000MB/s Service B 1000MB/s Core Switch C 1000*2+1000*2+ 1000*2+1000*2= 8000MB/s 6000MB/s A A Aggregation Switch 4000MB/s 1000*3+ 1000*3= 6000MB/s Rack Congestion

  7. Oversubscription Ratio Upper Link Bandwidth(UB) B B B …………… Server n Server 2 Server 1 Oversubscription Ratio= B*n/UB

  8. Smoothness Service A 1000MB/s Service B 1000MB/s Core Switch C 1000*2+1000*2+ 1000*2+1000*2= 8000MB/s 6000MB/s A A Aggregation Switch 4000MB/s 1000*3+ 1000*3= 6000MB/s Rack BW—Bandwidth Aggregate bandwidth usage on core links

  9. Smoothness Service A 1000MB/s Service B 1000MB/s Core Switch C 6000MB/s 6000MB/s A A Aggregation Switch 4000MB/s Rack

  10. Transparency • When we optimize the above two metrics, the cost should be low.

  11. Transparency Service A Service B • When we optimize the above two metrics, the cost should be low. Core Switch C A A Aggregation Switch Rack

  12. Transparency Service A Service B • When we optimize the above two metrics, the cost should be low. Core Switch C A A Aggregation Switch Rack

  13. Transparency Service A Service B • When we optimize the above two metrics, the cost should be low. Core Switch C A A NM—Number of Moves # of server moves to reach target allocation Aggregation Switch Rack

  14. What does this paper want to do----Optimize Datacenter Running Bing.com

  15. Challenges

  16. Optimizing for one metric degrades the other GOAL! 160% allocations optimizing 120% only worst-case survival 80% 40% 0% initial allocations optimizing allocation -40% only core bandwidth -80% -20% 0% 20% 40% 60% 80% reduction in BW usage Results from 6 Microsoft datacenters 9

  17. Motivation for combined solution Service communication matrix set of services cluster managerforming an application service (App,Service), (App,Service………. (App,Service) only 2% of service pairs communicate 1% of services generate 64% of traffic (lot more in the paper) (App,Service), (App,Service),………………. (App,Service) 15

  18. Service A 1000MB/s Service B 20MB/s Core Switch C 6000MB/s A A Aggregation Switch 4000MB/s Rack

  19. Problem Statement---the framework • Metrics: • Bandwidth (BW): The sum of the rates on the core links is the overall measure of the bandwidth usage at the core of network. • Fault Tolerance(FT): It is the average of Worst-Case-Survival(WCS) across all the services. • No. of Moves(NM): The number of servers that have to be re-imaged to get from initial datacenter allocation to the proposed allocation. • Optimization: Maximize FT – α BW Subject to NM ≤ N0 α – tunable positive parameter N0 – Upper limit on number of moves.

  20. Metric #1: Fault Tolerance FT= fraction (%) of service available during single worst-case failure network core Switches (EoR/Agg) Containers Racks (ToR) power distribution Fault domain: space of all machines affected by a single (any) failure, * Fault domains are complex

  21. Fault Tolerance • Cells – a subset of physical machines that belong to exactly the same fault domains. This allows reduction in the size of optimization problem. Power Supply Rack indicates the number of machine within cell n allocated to service k.

  22. Formal Definitions • I – the indicator function • I(n1,n2) = 1 if traffic from n1 to n2 traverses through core link & I(n1,n2) = 0 otherwise. • Bandwidth is given by: Where is required BW between a pair of machines from services k1 and k2. • To define FT let be the total number of machines allocated to service k affected by fault j. FT is given by: • K – total no. of services.

  23. But…. Maximize FT – α BW Subject to NM ≤ N0

  24. Min-Cut 10 1 1 8 6 1 9 1 1 7

  25. Optimizinng or BW only cut considered previously in [Meng et al., INFOCOM’10] network topology machine communication graph C A A + C k-wayn cut A A min cut k-way min graph cut: • ignores #M: reshuffles almost all machines • 99% migration chance - ignores FT: can’t be easily extended

  26. BW k-way graph cut cut 43

  27. What about FT?

  28. FT algorithm Input: service map+VM, fault domain map 1. Calculate initial FTC of DC 2. For every possible swap: a. calculate new FTC after swap b. ∆FTC=FTC_old-FTC_new 3. Execute swap max(∆FTC) - symmetry => many “good” swaps exist Only evaluate a small, random set of swaps (~1000) 17

  29. FT cut Non- solution: + Not good enough! 11

  30. Conclusion Study of communication patterns of Bing.com - sparse communication matrix - very skewed communication pattern Principled optimization of both BW and FT - exploits communication patterns - can handle arbitrary fault domains Reduction in BW: 20 - 50% Improvement in FT: 40 - 120% 37

  31. Thanks!

More Related