SEATTLE - A Scalable Ethernet Architecture for Large Enterprises

SEATTLE- A Scalable Ethernet Architecturefor Large Enterprises M.Sc. Pekka Hippeläinen IBM phippela@gmail T-110.6120 – Special Course in Future Internet Technologies

SEATTLE Based on and pictures borrowed from:Changhoon,K;Caesar,M;Rexford,J. Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises Is it possible to build a protocol that maintains the same configuration-free properties as Ethernet bridging, yet scales to large networks?

Contents • Motivation: network management challenge • Ethernet features: ARP and DHCP broadcasts • 1) Ethernet Bridging • 2) Scaling with Hybrid networks • 3) Scaling with VLANs • Distributed Hashing • SEATTLE approach • Results • Conclusions

Network management challenge IP Networks require massive effort to configure and manage Even 70% of an enterprise network’s cost goes to maintenance and configuration Ethernet is much simpler to manage However Ethernet does not scale well beyond small LANs SEATTLE architecture aims to provide scalability of IP with simplicity of Ethernet management

Why Ethernet is so wonderful ? Easy to setup, easy to manage DHCP server, some hubs, plug’n play

Flooding query 1: DHCP requests Lets say node A joins the ethernet To get IP / confirm IP – node A sends a DHCP request as a broadcast Request floods through the broadcast domain

Flooding query 2: ARP In order for node A to communicate to node B in the same broadcast domain, the sender needs MAC address of the node B Lets assume that node B IP is know Node A sends and Address Request Protocol (ARP) broadcast – to find out MAC address of node B Similarly to DHCP broadcast – the request is flooded through the whole broadcast domain This is basically {IP -> MAC} mapping

Why flooding is bad ? Large Ethernet deployments contain vast number of hosts and thousands of bridges Ethernet was not designed to such a scale Virtualization and mobile deployments can cause many dynamic events – causing control traffic Broadcast messages need to be processed in the end hosts – interrupting cpu The bridges forwarding tables grow roughly linearly with number of hosts

1) Ethernet bridging Ethernet consists of segments each comprising a single physical layer Ethernet bridges are used to interconnect segments to multi-hop network i.e. LAN This forms a single broadcast domain Bridge learns how to reach a host – by inspecting the incoming frames and associating the source MAC with the incoming port A bridge stores this information to a forwarding table – using the table to forward packets to correct direction

Bridge spanning tree One bridge is configured to be the root bridge Other bridges collectively compute a spanning tree based on the distance to the root Thus traffic is not routed through shortest path but along the spanning tree This approach avoids broadcast storms

2) Hybrid IP/Ethernet In this approach multiple LANs are interconnected with IP routing In hybrid networks each LAN contains at most a few hundred of hosts that form IP subnet IP subnet is associated with the IP prefix Assigning IP prefixes to subnet and associating subnets with router interfaces is a manual process Unlike MAC which is host identifier – IP address denotes the hosts current location in the network

Drawbacks of Hybrid approach • Biggest drawback is the configuration overhead • Router interfaces must be configured • Host must have correct IP address corresponding to the subnet it is located (DHCP can be used) • Networking policies are defined usually per network prefix i.e. topology • When network changes the policies must be updated • Limited mobility support • Mobile users & virtualized hosts at datacenters • If IP is constant – the user should stay on the same subnet

3) Virtual LANs Overcomes some problems of Ethernet and IP Networks Administrators can logically groups hosts into same broadcast domain VLANS can be configured to overlap – configuring bridges not the hosts Now broadcast overhead can be reduced by the isolates domains Mobility is simplified – IP address can be retained while moving between bridges

Virtual LANs Traffic from B1 to B2 can be ‘trunked’ over multiple bridges Inter domain traffic needs to be routed

Drawbacks of VLANs • Trunk configuration overhead • Extending VLAN across multiple bridges requires VLAN to be configured at each of the bridges participating. Often manual work. • Limited control plane scalability • Forwarding table entries and broadcast traffic for every active host and every VLAN visible • Insufficient data plane efficiency • Single spanning tree is still used within each VLAN • Inter-VLAN traffic must be routed via IP gateways

Distributed Hash Tables • Hash tables are used to store {key -> value} pairs • In case of multiple nodes there is nice way to make • Nodes symmetric • Distribute the hash table entries evenly among nodes • Keep reshuffling of entries small in case of adding/removing nodes • Idea is to calculate H(key) that is mapped to a host – one can visualize this to mapping to an angle (or to a point on a circle)

Distributed Hash Tables Each node is mapped to randomly distributed points on the circle Thus each node is mapped to multiple buckets One calculates the H(key) – and stores the entry to the node owning this bucket If node is removed – the values are now assigned to next buckets If node is added – entries are moved to the new buckets

SEATTLE approach 1/2 • 1) Switches calculate shortest path among themselves • This is link state protocol – basically Dijkstra • Switch level discovery protocol – Ethernet hosts do not respond • Switch topology much more stable than at host level • Much more scalable than at host level • Each switch has an ID – one MAC address of the switch interfaces

SEATTLE approach 2/2 • 2) DHT used in switches • {IP->MAC} mapping • This is essentially ARP request avoiding flooding • {MAC->location} mapping • When switch is located – routing along the shortest path can be used • DCHP Service location can also be stored • SEATTLE thus reduces flooding, allows usage of shortest path and offers a nice way to locate DHCP service

SEATTLE • Control overhead reduced with consistent hashing • When set of switches changes due to network failure or recovery – only some entries must be moved • Balancing load with virtual switches • If some switches are more powerful – the switch can represent itself as many – getting more load • Enabling flexible service discovery • This is mainly DHCP – but could be something like {“PRINTER”->location}

Topology changes Adding and removing switches/links can alter topology Switch/link failures and recoveries can also lead to partitioning events (more rare) Non-partitioning link failures are easy to handle – the resolver for hash entry is not changed

Switch failures • If switch fails or recovers hash entries need to be moved • The switch that published value – monitors the liveliness of resolver. Republishing entry when needed • The entries have TTL

Partitioning events • Each switch has to book keep also locally-stored location entries • If switch s_old is removed / not reachable – all the switches need to remove these location entries • This approach correctly handles partitioning events

Scaling:location • Hosts use directory service to publish and maintain {mac->location} mappings • When host a with mac_a arrives – it accesses switch S_a (steps 1-3) • Switch s_a publishes {mac_a,location}, by calculating the correct bucket F(mac_a) i.e. switch/resolver • When node b wants to send message to node a • F(mac_a) is calculated to fetch the location • ’Reactive resolution’ – also cache misses do not lead flooding

Scaling:ARP • When node b makes ARP request – SEATTLE converts this to a {F(IP_a) -> mac_a} request • The resolver/switch for F(IP_a) is usually different from F(mac_a) • Optimization for hosts making ARP request • F(IP_a) address resolver can also store mac_a and S_a • When node b makes F(IP_a) ARP request also mac_a->S_a mapping is cached to S_b • Shortest path (-> path 10) can now be used

Handling host dynamics • Location change • Wireless handoff • VM moved but retaining MAC • Host MAC address changes • NIC card replaced • Failover event • VM migration forcing MAC change • Host changes IP • DHCP leave expires • Manual reconfiguration

Insert, delete and update • Location change • Host h moves from s_old to s_new • s_new updates the existing mac-to-location entry • MAC change • IP-to-MAC update • MAC-to-location deletion (old) and insertion (new) • IP change • S_h deletes old IP-to-MAC and inserts new IP-to-MAC

Ethernet: Bootstrapping hosts • Host discovered by access switches • SEATTLE switches snoop ARP requests • Most OSes generate ARP request at boot up / if up • Aldo DHCP messages or host down can be used • Host configuration without broadcast • DHCP_SERVER hashes string “DHCP_SERVER” and stores the location to the switches • The “DHCP_SERVER” string is used to locate service • No need to broadcast for ARP or DHCP

Scalable and flexible VLANs • To support broadcasts – the authors suggest using groups • Similar to VLAN - groups is defined as a set of hosts who share the same broadcast domain • The groups are not limited to layer-2 reachability • Multicast-based group-wide broadcasting • Multicast tree with broadcast root for each group • F(group_id) used for broadcast root location

Simulations • 1) Campus ~40 000 students • 517 routers and switches • 2) AP-Large (Access Provider) • 315 routers • 3) Datacenter (DC) • 4 core routes with 21 aggregation switches • Routers were converted to SEATTLE switches

Cache timeout and AP-large with 50k hosts Shortest path cache timeouthas impact on number oflocation lookups Even with 60s time out 99.98%packets were forwarded without lookup Control overhead (blue) decreases very fast – where as the table size increases only moderately Shortest path is used in majority of routing in these simulations

Table size increase in DC Ethernet bridges stores entryfor each destination ~ O(sh)behavior across network SEATTLE requires only ~O(h) state since only access and resolver switches need to store and location information for each hosts With this topology the table size was reduced by factor of 22 In AP-large case the factor was increased to 64

Control overhead in AP-large Number of control messages over all links in the topologydivided by the number switchesand duration of the trace SEATTLY significantly reduces control overhead in the simulations This is mainly because Ethernet generates network wide floods for a significant number of packets

Effect of switch failure in DC Switches were allowed to failrandomly The average recover time was30 seconds SEATTLE can use all the links in the topology, where as Ethernet is restricted to the spanning tree Ethernet must re-compute the tree causing outages

Effect of host mobility in Campus Hosts were randomly moved between access switches For high mobility rates,SEATLLES loss rate was lower than Ethernet On Ethernet it takes sometime for switches to evict the stale information location information and re-learn the new location SEATTLE provided low loss and broadcast overhead

What was omitted • Authors suggest multi-level one-hop DHTs • With large dynamic networks – it can be beneficial that entries are stored close • This is achieved with regions and backbone – border switches connect to the backbone switches • With topology changes • Approach to seamless mobility is described in the paper • Updating remote host caches is required with switch based MAC revocation lists • Some simulation results • Authors also made sample implementation

Conlusions • Operators today face challenges in managing and configuring large networks. This is largely to complexity of administering IP networks. • Ethernet is not a viable alternative • poor scaling and inefficient path selection • SEATTLE promises scalable self-configuring routing • Simulations suggest efficient routing, low latency with quick recovery • Host mobility supported with low control overhead • Ethernet stacks at end hosts are not modified

Thank you for your attention! Questions? Comments?

SEATTLE - A Scalable Ethernet Architecture for Large Enterprises

SEATTLE - A Scalable Ethernet Architecture for Large Enterprises

Presentation Transcript

Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises

A High-Performance Scalable Graphics Architecture

Scalable Processor Architecture (SPARC)

A Scalable Internet Architecture

SITAR: A Scalable Intrusion Tolerant Architecture for Distributed Services

A Scalable Front-End Architecture for Fast Instruction Delivery

A Scalable Information Management Middleware for Large Distributed Systems

A Scalable Architecture for LDPC Decoding

A Scalable Web Cache Consistency Architecture

SITAR: A Scalable Intrusion Tolerant Architecture for Distributed Services

Enterprises / Large Companies

Highly Available, Highly Scalable – Enterprise Manager 12c for Large Enterprises

Scalable Reliable Multicast Architecture

DIRAC: A Scalable Lightweight Architecture for High Throughput Computing

DynaSoar A Scalable Architecture for High Performance AI Applications

Scalable JavaScript Application Architecture

Alcatel OmniPCX Enterprise for Large Enterprises

DynaSoar A Scalable Architecture for High Performance AI Applications

Seattle Architecture Photography

Field Management Software for Medium & Large Enterprises

Field Management Software for Medium & Large Enterprises