200 likes | 631 Views
A Scalable, Commodity Data Center Network Architecture. Mohammad Al-Fares, Alexander Loukissas , Amin Vahdat Presented by Gregory Peaker and Tyler Maclean. Overview. Structure and Properties of a Data Center Desired properties in a DC Architecture Fat tree based solution
E N D
A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares, Alexander Loukissas, AminVahdat Presented by Gregory Peaker and Tyler Maclean
Overview • Structure and Properties of a Data Center • Desired properties in a DC Architecture • Fat tree based solution • Evaluation of fat tree approach
Problem With common DC topology • Single point of failure • Over subscription of links higher up in the topology • Typical over subscription is between 2.5:1 and 8:1 • Trade off between cost and provisioning
Properties of solutions • Compatible with Ethernet and TCP/IP • Cost effective • Low power consumption & heat emission • Cheap infrastructure • Commodity hardware • Allows host communication at line speed • Over subscription of 1:1
Review of Layer 2 & Layer 3 • Layer 2 • Data Link Layer • Ethernet • MAC address • One spanning tree for entire network • Prevents looping • Ignores alternate paths • Layer 3 • Transport Layer • TCP/IP • Shortest path routing between source and destination • Best-effort delivery
FAT Tree based Solution • Connect end-host together using a fat tree topology • Infrastructure consist of cheap devices • Every port is the same speed • All devices can transmit at line speed if packets are distributed along existing paths • A k-port fat tree can support k3/4 hosts
Problems with a vanilla Fat-tree • Layer 3 will only use one of the existing equal cost paths • Packet re-ordering occurs if layer 3 blindly takes advantage of path diversity • Creates overhead at host as TCP must order the packets
FAT-tree Modified • Enforce special addressing scheme in DC • Allows host attached to same switch to route only through switch • Allows inter-pod traffic to stay within pod • unused.PodNumber.switchnumber.Endhost • Use two level look-ups to distribute traffic and maintain packet ordering.
2 Level look-ups • First level is prefix lookup • Used to route down the topology to endhost • Second level is a suffix lookup • Used to route up towards core • Diffuses and spreads out traffic • Maintains packet ordering by using the same ports for the same endhost
Diffusion Optimizations • Flow classification • Eliminates local congestion • Assign to traffic to ports on a per-flow basis instead of a per-host basis • Flow scheduling • Eliminates global congestion • Prevent long lived flows from sharing the same links • Assign long lived flows to different links
Implementation • NetFPGA: • 4 Gigabit Ports, 36 Mb SRAM • 64MB DDR2, 3GB SATA Port • Implemented elements in Click Router Software • Two Level Table • Initialized with preconfigured information • Flow Classifier • Distributes output evenly across local ports • Flow Report + Flow Schedule • Communicates with central schedule
Evaluation • Purpose: measure bisection bandwidth • Fat-Tree: 10 machines connected to 48 port switch • Hierarchical: 8 machines connected to 48 port switch
Related Work • Myrinet – popular for cluster based supercomputers • Benefit: low latency • Cost: proprietary, host responsible for load balancing • Infiniband – used in high-performance computing environments • Benefit: proven to scale and high bandwidth • Cost: imposes its own layer 1-4 protocol • Uses Fat Tree • Many massively parallel computers such as Thinking Machines & SGI use fat-trees
Conclusion • The Good: cost per gigabit, energy per gigabit is going down • The Bad: Datacenters are growing faster than commodity Ethernet devices • Our fat-tree solution • Is better: technically infeasible 27k node cluster using 10 GigE, we do it in $690M • Is faster: equal or faster bandwidth in tests • Increases fault tolerance • Is Cheaper: 20k hosts costs $37M for hierarchical and $8.67M for fat-tree (1 GigE) • KO’s the competing data center’s