1 / 25

Flyways in Data Centers

Srikanth Kandula, Jitendra Padhye and Victor Bahl Microsoft Research. Flyways in Data Centers. Data Center Networking. Networking is a major cost of building large data centers Switches, routers, cabling complexity, management …. Expensive equipment: aggregation switches cost > $250K

vilhelm
Download Presentation

Flyways in Data Centers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Srikanth Kandula, Jitendra Padhye and Victor Bahl Microsoft Research Flyways in Data Centers

  2. Data Center Networking • Networking is a major cost of building large data centers • Switches, routers, cabling complexity, management …. • Expensive equipment: aggregation switches cost > $250K • Tradeoff : provide “good” connectivity at “low” cost • Lot of recent interest from industry and academia

  3. Traditional Data Center Networks • 20-40 machines per rack • 1Gbps links to top of the rack (ToR) switch • 160 ToRs per aggregation switch • connected to aggregate switches with 10Gbps links Rest of the DC network … Aggregate Switch Aggregate Switch … ToR … 10Gbps x 160 1Gbps x 20

  4. The oversubscription problem • As one goes up the hierarchy, link capacity does not scale with number of servers • 20 servers w/ 1 Gbps link to ToR switch • 10Gbps uplink from ToR to Aggregation switch • 1:2 oversubscription Aggregate Switch … ToR … 10Gbps 1Gbps x 20

  5. The oversubscription problem • As one goes up the hierarchy, link capacity does not scale with number of servers • 20 servers w/ 1 Gbps link to ToR switch • 10Gbps uplink from ToR to Aggregation switch • 1:2 oversubscription • Implications: • Potential for congestion when communicating between racks • So, applications minimize such communication

  6. Possible solutions • Fewer servers per ToR • More switches, higher cost • Use higher bandwidth links for ToR uplinks • Technological limitations • Today, largest practical link is 40Gbps (4x10Gbps) • Expensive • Either use more aggregation switches • Or build ones with sufficient backplane bandwidth • Clos networks

  7. Non-traditional networks … … … • Lots of inexpensive hardware • Create multiple paths between ToRs • Carefully managed routing • FatTree, VL2, Bcube etc. … … … … … FatTree … … … … … … … … … … … … … … … … … … VL2

  8. Note that …. • Key goal of all proposed solutions is to eliminate oversubscription • There are various other advantages as well • Why? • Need to move VMs anywhere in the network • Network is no longer a bottleneck • Needed for all-pairs-shuffle workload • Is there an alternative to eliminating oversubscription?

  9. 1:1 Oversubscription may not always be necessary • Studied application demands from a production cluster • Short-lived, localized congestion • If we can add capacity to “hotspots” as they form, we may not need to eliminate oversubscription

  10. Data set • Production cluster of 1500 servers • Data-mining workload • 1:2 oversubscribed tree • 20 servers per rack (75 total racks) • 1Gbps links from server to ToR • 10Gbps uplink from ToR to aggregation switch • Socket-level traces over several weeks • Demands computed by averaging traffic over 5 minute windows

  11. Only a few ToRs are hot, and most of their traffic goes to a few other ToRs

  12. These hot ToRs hold up completion time of the demand matrix

  13. Our idea • Build a slightly oversubscribed base network • Significant cost savings • Add links between ToRs as and when needed • “Flyways” Aggregate Switch … ToR ToR ToR … … …

  14. Questions … • How to realize flyways • Wireless • Radio: 802.11n, 60Ghz, Free space optics • Wired • Randomized links, Optical switches • Which flyways to enable? • Flyway between which ToRs? • What capacity do flyways need? • …..

  15. 60GHz technology • 57-64 GHz • 7GHz bandwidth (802.11b/g has only 80MHz ) • Available worldwide • High bandwidth • 1-4 Gbps links are already available • Low range (1-10 meters) • Advantageous an data center environments • Improves spatial reuse • Line of sight easy to achieve in data center • Antennas on top of the racks • Small form-factor antennas • Steerable directional antennas are feasible • Recent advances in CMOS technology bringing cost down • Sayana, SiBeam, Wilocity, MediaTek IBM/MediaTek 60Ghz chip SiBeamWirelessHD Ref Kit

  16. Which flyways to enable? We propose an algorithm for this …..

  17. Need modest bandwidth Flyways need to carry only a small fraction of ToRs uplink traffic

  18. Evaluation • Trace driven numerical simulations • 1500 servers, 75 racks, 1:2 oversubscribed tree • Flyways do not carry transit traffic • Real data center layout • Wireless: • Ignore interference • Vary range, capacity etc. • Metric: • completion time of demand metric (CTD) • Normalized by completion time in non-oversubscribed network • CTD == 2  no improvement • CTD == 1  equivalent to non-oversubscribed network

  19. Algorithm for placing flyways • Start with no flyways • Solve demand matrix • Find worst laggard in demand matrix • Add flyway for worst pair (Modulo constraints) • Go to (2)

  20. 1Gbps flyways, no range restriction With 50 flyways, performance is comparable to non-oversubscribed network

  21. Impact of Flyway capacity 1Gbps flyway capacity appears to be sufficient

  22. Flyway range = 10m Flyways are beneficial even with limited range

  23. Conclusions • A new paradigm for data center networks • Slightly oversubscribed base network with dynamic capacity addition using flyways • Today, 60GHz wireless appears to be a good choice for flyways

  24. Backup

  25. Bandwidth needs • Flyways carry far less traffic than uplink • In our model, only ToR-to-ToR (1 hop) • 60GHz band is 9x wide compared to 802.11b/g band • With better encodings, significantly higher capacity is possible

More Related