E N D
1. Presenter: Praveen Yalagandula, HP Labs, Palo Alto
Collaborators:
Jayaram Mudigonda & Jeff Mogul, HP Labs
Mohammad Al-Fares, UCSD SPAIN: Multipath Forwarding For COTS Ethernet
2. OC Testbed Shuffle Experiment 2 29 May 2012 Shuffle experiment with about 80 servers
Approximate Map-Reduce or a large join
500 MB transfer from every node to all other nodes
At most 5 simultaneous transfers at a time
3. Utilization 3 29 May 2012
4. OC Testbed Shuffle Experiment 4 29 May 2012 Shuffle experiment with about 80 servers
Approximate Map-Reduce or a large join
500 MB transfer from every node to all other nodes
At most 5 simultaneous transfers at a time
5. Goal Scalable, cheap, high-performance L2 datacenter networks
Why L2?
Cheaper switches (vs. L3 routers)
Less configuration required
Flat addressing makes VM migration much easier
29 May 2012 5
6. What about Ethernet? Ethernet is (mostly) wonderful:
COTS technology
Increasingly high bandwidth
Self-configuring with Spanning Tree Protocol (STP)
Avoids packet-forwarding loops
But: Spanning Tree creates problems:
Only links within the spanning tree are actually used
Limits bisection BW and flexibility in node placement
Forces the use of expensive core switches 29 May 2012 6
7. SPAIN:Smart Path Assignment In Networks Multipath forwarding
Exploits all physical links, not just spanning tree
Over arbitrary topologies
does not require fat-tree, hypercube, etc.
Uses unmodified, COTS Ethernet switches
as long as they have VLAN support
29 May 2012 7
8. Other approaches Designs requiring specific, regular topologies
PortLand, VL2, DCell, BCube
Designs focusing on shortest-path but not multipath
TRILL, SmartBridge, SEATTLE
Also require changes to switches (HW/SW/Both)
SPAIN is unique because it supports multiple paths between nodes in arbitrary topologies, uses unmodified COTS switches, and is incrementally deployable 29 May 2012 8
9. SPAIN in one slide Central, off-line manager:
Discovers network topology
Pre-computes a set of paths that best exploit the redundancy of the physical network links
Merge paths into a minimal set of trees
Map trees onto VLANs and install these into switches
End-host modifications:
Download mapping table from manager
Classify packets into flows (e.g., TCP connections)
Choose correct VLAN for new outgoing flows
“Chirping” protocol for efficient fault detection (etc.)
Table maintenance on packet transmission/reception 29 May 2012 9
10. SPAIN in Action 10 29 May 2012
11. 11
What paths to use?
Goal: Utilize the topological redundancy well
Challenge: Network graphs can be extremely huge
Approach: Trim the graphs, use link-disjoint paths greedily
How to map paths to VLANs?
Goal: Maximize the number of installed paths
Challenge: Limited switch resources; only 4096 VLANs
Approach: Graph-coloring based algorithms
What should the end-point do?
Goal: Balance load well
Challenge: Limited view of the network
Approach: Pick a VLAN randomly (uniform/biased on path length) SPAIN needs to answer three questions 29 May 2012
12. SPAIN: Simulations Considered different topologies
Regular: Fat Trees
Arbitrary: AS topologies from the RocketFuel project
Up to 1600 switches, 13K links, 80K end hosts
Compared SPAIN and Spanning Tree
Metrics
Path Set Quality: Link Coverage, (Potential) Reliability
Feasibility: #VLANs required
Effectiveness: Aggregate throughput
With a large number of flows (up to 10Million) 12 29 May 2012
13. SPAIN: Simulation Results Link Coverage:
SPAIN: 100% in all cases considered
Spanning Tree: varied from 2% to 51%
Reliability: With a link failure probability of 0.04
Spanning Tree: prob. of path failure varies from 0.16 to 0.45
SPAIN: prob. of path failure varies from 0.00 to 0.04
Feasibility:
#VLANs required is less than 4K for all topologies except one case
Aggregate Throughput:
Arbitrary topologies: 0.67X to 10X improvement over Spanning Tree
Fat tree: 24X improvement over Spanning Tree 13 29 May 2012
14. SPAIN OpenCirrus testbed 29 May 2012 14
15. SPAIN: Prototype experiments Quick-and-dirty Linux implementation
Adds lots of unnecessary overhead; we know how to fix it
Deployed and tested on OpenCirrus testbed
80 hosts dispersed across 3 racks/3 switches
Shuffle experiment (similar to Map-Reduce or large join)
500 MB transfer from every node to all other nodes
At most 5 simultaneous transfers at a time
Results:
Improvements limited by software and servers, not by SPAIN design
30% improvement in the aggregate throughput
22% reduction in the mean completion time
Demonstrated incremental deployability of SPAIN
Demonstrated fault-tolerance capability of SPAIN 29 May 2012 15
16. SPAIN on OpenCirrusShuffle Experiment 29 May 2012 16 Photo of OpenCirrus?
Animation of the dataPhoto of OpenCirrus?
Animation of the data
17. Incremental deployability 29 May 2012 17
18. Next Steps Open SPAIN enhancements for other users of the test bed
OpenCirrus topology planned enhancements
A two-level Fat-Tree + Mesh topology
Connect all 8 racks at HP Labs
No over-subscription (full bisection bw)
18 29 May 2012
19. Questions? ”SPAIN: COTS Data-Center Ethernet for Multipathing over Arbitrary Topologies”Jayaram Mudigonda and Praveen Yalagandula, HP Labs; Mohammad Al-Fares, UCSD; Jeffrey C. Mogul, HP Labs
To appear at NSDI 2010, San Jose, CA 29 May 2012 19