170 likes | 263 Views
A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares Alexander Loukissas Amin Vahdat SIGCOMM’08. Reporter: Fuchao Zhou. Problem. How to design Data Center Network Architecture -- Scalable interconnection bandwidth -- Without incurring tremendous cost
E N D
A Scalable, Commodity Data Center Network ArchitectureMohammad Al-Fares Alexander Loukissas Amin VahdatSIGCOMM’08 Reporter: Fuchao Zhou
Problem How to design Data Center Network Architecture -- Scalable interconnection bandwidth -- Without incurring tremendous cost -- Compatibility with hosts running Ethernet and IP
Existing solutions • Using specialized hardware and communication protocols such as InfiniBand and Myrinet • -- More expensive for using high-end switches • -- Not natively compatible with TCP/IP applications • Using commodity Ethernet switches and routers to interconnect cluster machines • -- Need appropriate network topology • -- Bandwidth scales poorly with cluster size • -- Non-linear cost increases with cluster size
Existing solutions • Typical architectures today • -- Two-level trees of switches or routers (supports 5K to 8K hosts) • -- Three-level trees of switches or routers • Disadvantages • -- only support 50% bandwidth available at the edge of the network • -- incurring tremendous cost($37M to supports 27,648 hosts)
Proposed solution • Typical architectures today • -- k pods, each containing two layers of k/2 switches • -- (k/2)2 k-port core switches • -- supports k3/4 hosts(48-ary fat-tree supports 27,648 hosts) • Advantages • -- non-blocking • -- all switching elements are identical ($8.64M to supports 27,648 hosts) • -- compatible with hosts running Ethernet and IP k-ary fat-tree topology
Static Routing method • two-level routing table • -- maximum bisection bandwidth in this network • IP address • -- Core switches:10.k.j.i • -- Pod switches: 10.pod.switch.1 • -- Hosts:10.pod.switch.ID
Static Routing example 2 2 3 3 10.2.3.1 1 10.0.3.1 0 2 3 10.0.1.3 10.2.1.3 Packet from 10.0.1.2 to host 10.2.0.3 Packet from 10.0.1.3 to host 10.2.0.2
Dynamic Routing methods • flow classification • 1. Recognize subsequence packets of the same flow, and forward them to the same outgoing port against packet reordering; • 2. Periodically reassign output ports to ensure fair distribution on flows on output ports in the face of dynamically changing flow size.
Dynamic Routing methods • flow scheduling (with a central scheduler) • Method1:(notification) • 1. Edge switches detect any outgoing large flow • 2. Send notifications to a central scheduler periodically • 3. The central scheduler order a re-assignment; • Method2:(monitor) • 1. A central scheduler tracks all active large flows • 2. Assign them non-conflicting paths if possible. • 3. The scheduler maintains Boolean state for all links
Fault-Tolerance • Simple failure broadcast protocol • -- Each switch maintains a Bidirectional forwarding Detection session(BRD)(D.Datz, D.Ward. BFD for IPv4 AND IPv6, 2008) • Two classes of failures
Fault-Tolerance based on the flow classification(1) Outgoing inter- and intra-pod traffic originating from the edge switch Intra-pod traffic using the upper-layer switch as an intermediary Inter-pod traffic coming into the upper-layer switch
Fault-Tolerance based on the flow classification(2) Outgoing inter-pod traffic Incoming inter-pod traffic
Fault-Tolerance based on the flow scheduling • Simpler • The scheduler marks any link reported to be down as busy or unavailable
Limitations • The performance evaluation of a prototype of the architecture consisting of 4 pods(16 hosts) • Fat-tree topology is wiring overhead • -- 3k3/4 wire cables for a k-ary fat tree • -- e.g. k=48, supporting 27,648 hosts. • 3*483/4=82,944 wire cables --. • How many changes for the commodity switches should be considered. • --don’t support the dynamic routing techniques -- don’t support two-level routing table
Limitations • Dynamic routing techniques also have limitations --- flow classifier just only has local knowledge available • -- centralized scheduler with global knowledge may be infeasible for large arbitrary network • two-level routing solution cannot avoid local congestion without dynamic routing technique