1 / 31

A Scalable, Commodity Data Center Network Architecture

A Scalable, Commodity Data Center Network Architecture. Jingyang Zhu. Outline. Motivation Background Fat Tree Architecture Topo Routing Fault Tolerent Results. Motivation. Map Reduce. Large Data Shuffle. Intuitive Approach. High End Hardware (e.g., InfiniBand). Alternative Approach.

cissy
Download Presentation

A Scalable, Commodity Data Center Network Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Scalable, Commodity Data Center Network Architecture Jingyang Zhu

  2. Outline • Motivation • Background • Fat Tree Architecture • Topo • Routing • Fault Tolerent • Results

  3. Motivation Map Reduce Large Data Shuffle

  4. Intuitive Approach • High End Hardware (e.g., InfiniBand)

  5. Alternative Approach • A dedicated interconnection network • Scalablilty • Cost • Compability (i.e., app, os, hardware)

  6. Typical Topology

  7. Clos Network (m, n, r) = (5, 3, 4) 1. strictly non-blocking (m >= 2n - 1) 2. rearrangeably non-blocking (m >= n)

  8. Benes Network • A Clos Network with 2x2 switches

  9. Fat Tree • Multi-path • Routing: UpLink (right) + DownLink (left) • Oversubscription: ideal BW / actual BW of host end. e.g. 1 : 1 is good; 5 : 1 is bad Node 1 (0001) -> Node 6 (0110): 2 possible paths

  10. Topo of Data Center - Hierachy 10 GigE Link GigE Link Multi-Path = 2 Conventional Topo in Data Center

  11. Topo of Data Center - Fat Tree (k/2)^2 k-port k/2 k-port k/2 k-port k pods Fat Tree Topo (k = 4) # of hosts: k^3 / 4, e.g., k = 48 => # of hosts: 27648 (Scalability!!!)

  12. Addressing - Compability!!! • Pod switches: 10.pod #.switch #.1 • Core Address: 10.k.j.i (k - radix, <j, i> - coordinate) j,i = 1,2,...,k/2 10.1.2.1 10.1.3.1 10.3.2.1 10.3.3.1 10.0.3.1 10.2.3.1 10.2.1.1 10.1.1.1 10.3.0.1 10.3.1.1 10.1.0.1 10.0.0.1

  13. Addressing (con't) • Host: 10.pod #.switch #.ID switch 1 switch 0 switch 1 switch 0 10.0.1.3 10.0.0.3 10.1.1.3 10.0.0.2 10.1.1.2 10.1.0.3 10.1.0.2 • Addressing Format is for further routing purpose

  14. 2-level table routing - pod switch Downlink to Host Uplink to Core • 24 - MSB • 8 - LSB • Traffic diffusion occurs only in the first half of a packet’s journey 10.2.1.3 10.2.1.2

  15. Generation of routing table • addPrefix • (pod switch, pre, port) • addSuffix • (pod switch, suf, port)

  16. 1-level table routing - core switch

  17. Routing Table Implementation • Content Addressable Memory (CAM) • Input: data; output: match / mismatch

  18. Routing Table Implementation (con't) 00 Match 10.2.0.3 RAM Address 1001 Host Address

  19. Routing Example: Hierarchical Tree 10.0.1.2 -> 10.2.0.3 10.0.1.3 -> 10.2.0.2

  20. Routing Example: Fat Tree 10.0.1.2 -> 10.2.0.3 No Contention!!! 10.0.1.3 -> 10.2.0.2

  21. Dynamic Routing • Up to now, the routing alg is based on static table...any improvement??? • Yes, using dynamic routing • Dynamic Routing • Flow Classification • Flow Scheduling

  22. Dynamic Routing 1 - Flow Classification • Flow: A set of packets that must have its order preserved • Dynamic Routing • Avoid reordering for same flow • Reassign a minimum number of flows to minimize the disparity between ports • Flow Classifier: identify flows

  23. Flow Classification • Check src & dst address • Balance the port load dynamically Every t seconds to rearrange flows Max 3 flows to be rearranged Avoid reordering Balance the port DYNAMICALLY Have some risks to reorder the flow!!! - For performance consideration, not for correctness

  24. Dynamic Routing 2 - Flow Scheduling • Large flows are critical - schedule the large flows independently // edge switches if (length (flow_in) > threshold) notify central schedular else route as normal // central schedular if (receive notification) foreach possible path if (path not reserved) reserve the path & notify switches along the path

  25. Discussion • Which one is better? • Flow classification • Flow scheduling Locally, inter pod switch Locally, inter pod switch Globally, among all the paths and switches Globally, among all the paths and switches

  26. Fault Tolerance • How to know links or switches fail?Bidirectional Forwarding Detection (BFD)

  27. Fault Tolerance (con't) • Basic ideas • Mark the link unavailable when routing, e.g., marking the load inf in flow classification • Broadcast the fault to other switches and avoid routing it

  28. Cost 1:1

  29. Power & Heat 10 GigE Power and Heat for different switches

  30. Performance Percentage of ideal bisection bandwidth Different Benchmarks

  31. Conclusion • Fat tree for data center interconnection • Scalable • Cost efficient • Compatible • Routing details, locally & globally • Fault tolerant

More Related