1 / 37

Scalable Multi-module Switches with Quality of Service Thesis Defense

Scalable Multi-module Switches with Quality of Service Thesis Defense. Santosh Krishnan sk@cs.columbia.edu May 1, 2006 Advisor : Prof. Henning G. Schulzrinne Co-advisor : Dr. Fabio M. Chiussi. Outline. Problem Definition Motivations, list of contributions Switching Model: Components

Antony
Download Presentation

Scalable Multi-module Switches with Quality of Service Thesis Defense

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Multi-module Switches with Quality of ServiceThesis Defense Santosh Krishnan sk@cs.columbia.edu May 1, 2006 Advisor: Prof. Henning G. Schulzrinne Co-advisor: Dr. Fabio M. Chiussi

  2. Outline • Problem Definition • Motivations, list of contributions • Switching Model: Components • Related work: Formal methods in switching • Buffered Clos Switches • Concept of functional equivalence • BCS: Throughput and Quality of Service • Single-path BCS: CIOQ, aggregation, pipelining • Multi-path BCS: Parallelization • Conclusions

  3. Problem Definition Goals: • How to methodically construct a high-capacity switch? • How to design high-performance algorithms for such switches? Importance: • Physical layer improvements: 10-G Ethernet, OC-768 • Converged network requiring QoS: IPTV, MPLS VPN • Case for modular design: component reuse What exists: • Ad-hoc approach to switch design • No benchmarks, varying performance satisfaction • Non-blocking, 100% throughput, nominal capacity

  4. Contributions • Taxonomy of multi-module switches: Buffered Clos Switches • Performance framework: Functional equivalence with ideal switch Mimics circuit-switching rigor Applications Combined I/O Queueing Aggregation • QoS: Online maximal matching • Throughput: Critical matching • Strict stability: Maximal matching, SOQF • Switched Fair Airport matching • Shadow CIOQ and Decompose • Virtual Element Queueing Pipelining • Striping and Equal Dispatch • Concurrent Dispatch: 3D matching Parallelization • Flow-based PPS: Clos fitting • Cell-based PPS: Striping, Equal Dispatch Memory Space Memory • Combination methods • Recursive BCS

  5. Switching Model • Basic property: Contention • Flows: Guaranteed QoS, Best-effort • Ideal Switch: Provide bandwidth trunks, sustain link capacity • Black box for network engineering purposes CPU Slow Path PPU PPU Switch Fabric Outputs PPU PPU Inputs PPU PPU Fast Path

  6. Switching Model: Components Memory Element Space Element Buffers Matching: 2D Link Scheduling Mesh Conflict-free property Matching complexity Constraints: Memory bandwidth Full-mesh circuitry Monolithic OQ Switch: Ideal IQ Switch • Architecture: Interconnect memory and space elements • Algorithms: Meaningfully emulate the ideal switch for throughput and QoS

  7. Background: Clos Networks • Strictly non-blocking: K ≥ 2M – 1(Clos theorem) • Re-arrangeable:K ≥ M(Slepian-Duguid) M Outputs Inputs- One circuit Recognize: • Space-time duality • Fitting: matrix decomposition K Fitting Algorithms Inspiration: Replace selected elements with memory

  8. Background: CIOQ Switches Pro: • Low memory bandwidth Con: • Complexity of matching: • Switch size • Frequency • Reconfiguration rate Queue State Configuration 0 0 1 3 0 5 • Offline: Templates • Maximum, Maximal, Critical • Heuristics 1 0 0 7 0 1 0 1 0 0 5 0 What performance results when applied to a changing queue state?

  9. Background: CIOQ Switch Results Based on combinatorics and stability theory QoS (Weller-Hajek ‘97) Throughput Auxiliary Results: Envelope matching (Kar ‘00), Packet-mode matching (Marsan ‘02)

  10. Framework: Buffered Clos Switches Parallelize: Pool memory resources PPS Definition: • Switch size • Type of elements • Number in first stage • Number in second • Speedup Aggregate: Smaller elements CIOQ-A, G-MSM Pipeline: Lower speed, complexity CIOQ-P, G-MSM • Isomorphism: Non-blocking Clos network • Properties: Multi-stage, fully connected, symmetric, uniform

  11. Framework: Functional Equivalence Characterize relative performance: Functional equivalence f1: Allocate known rates Shape: Bandwidth trunks f2: Relative stability for admissible traffic Literature: 100% throughput f3: Per-output relative stability Work conserving f4: Strict relative stability: all pairs f5: Exact emulation • Emulate an ideal switch: exact, asymptotic • Bandwidth trunks, independent throughput optimization

  12. CIOQ: Bandwidth Trunks Shaping plus online matching is sufficient for bandwidth guarantees Offline BVN Templates Rate Matrix Cons: Template Storage Centralized rate processing Online Weight Scheduler Arbitrary Arrivals Shape/Batch VOQ Online: Maximal (s=2) Online: Critical (s=1) Split time into intervals: T = GCD (R) Batch traffic in each interval: Simple counters • Extension of Weller-Hajek maximal matching theorem • Clos analogy: Maximal matching as a strategy for orderly assignments

  13. CIOQ: Admissible Traffic Best Throughput Results: • No speedup: MWM (McKeown et al.), Speedup 2: Maximal (Dai-Prabhakar) • Can a simple maximum size matching suffice for admissible traffic? Red Herring! Critical matching suffices for asymptotic 100% throughput (f2) 3 0 3 0 6 6 Augment MSM 7 7 0 1 1 1 Queue State Critical Matching 0 2 5 5 0 2 Intuition: 2x2 Line buckets R1 R2 C1 C2 Max

  14. CIOQ: Strict Relative Stability • Maximal matching: Keeps under-subscribed outputs stable (f3) (s=2) • Shortest Output-Queue First: (f4) (s=3) • Output element scheduler: Identical to the one in emulated switch • Intuition: Give preference to less congested pairs at the output • Asymptotic emulation of an ideal switch: long-term fairness

  15. Switched Fair Airport • Integrate two policies M1 and M2: • M1: Provides bandwidth trunks given rate reservations • M2: Optimize throughput independent of above rates Multi-phase Combination Exclusive Combination Speedup Required: M1 M2 Maximal matching is additive to any other policy, hence needs the least speedup

  16. CIOQ-A: Aggregation Advantages: Smaller space element Lower arbitration complexity Heterogeneous subports • Shadow-Decompose: CIOQ emulation (f5) • VEQ Matching: Less complex, only for admissible traffic (f2)

  17. CIOQ-P: Pipelining • Sequential Dispatch: CIOQ emulation (f5) • Concurrent Dispatch: • Limited candidates: stale-state issues • 3D Maximal Matching for relative stability • Striping: Shadow on envelope basis • Equal Dispatch: • Explicitly equalize load • Separate occupancy counters for each SE Implement arbitrarily complex policies! Advantages: Slower space element Lower arbitration complexity

  18. G-MSM: Combination Combination methods: CIOQ-A/P No need for independent analysis Recursion possible

  19. PPS: Architecture Core Advantages: Demux Mux Reuse low-capacity core switch Implement arbitrarily slow memories! provided Memoryless first and third stages Performance: Emulates OQ switch • Pool the resources on several switching paths • Dual of a CIOQ-P switch • Matching algorithm replaced by load balancing • Sequence control might be necessary

  20. PPS: Flow-based • Model for clustered routers: • Per-flow path assignment: explicit or hashed • No need for sequence control • Memory in first stage • High speedup (Clos fitting) • Unbalanced load assignment • Requires knowledge of loads Split flows

  21. PPS: Cell-based • Uniformly distribute the load of each flow • Premise: Each core element receives 1/K cells of each flow • Equal dispatch and striping suffice for asymptotic OQ emulation • Bandwidth trunks: Large buffers required

  22. Summary: A Recipe Book • Taxonomy of multi-module switches: Buffered Clos Switches • Performance framework: Functional equivalence with ideal switch Applications Combined I/O Queueing Aggregation • QoS: Online maximal matching • Throughput: Critical matching • Strict stability: Maximal matching, SOQF • Switched Fair Airport matching • Shadow and Decompose • Virtual Element Queueing Pipelining • Striping and Equal Dispatch • Concurrent Dispatch: 3D matching Parallelization • Flow-based PPS: Clos fitting • Cell-based PPS: Striping, Equal Dispatch Memory Space Memory • Combination methods • Recursive BCS

  23. Avenues for Follow-on Research • Efficient policies for multicast • Similar treatment on other interconnection networks • Theory of backpressure: • Recent interest in buffered crossbars • Quality of stability: Average delay analysis • Short-timescale equivalence • Emulation of a finite-memory ideal switch • Interplay of buffer management with matching algorithms

  24. Supporting Slides

  25. Relevant Publications • Dynamic Partitioning: Switch Memory Management, Infocom ’99 • Packet Switches with QoS Support, Hot Interconnects ’00 • Feedback Control for Distributed Scheduling, Globecomm ’00 • Buffered Clos Switches, Columbia TR ’02 • Inverse Multiplexing for Switches, Globecom ’98 • Switched Connections Inverse Multiplexing, Intl. Conf. ATM ’99 • Recognition of Parallel Packet Switches, GBN, Infocom ’01 • Stability Analysis of Parallel Packet Switches, ICC ’01 • Open-loop Schemes for Multi-path Switches, ICC ‘03 Switching Algorithms Parallel Switches

  26. Proposal Conjectures Proposal: six conjectures • Maximal matching is sufficient to isolate oversubscribed outputs: DONE • SOQF is sufficient for strict relative stability: DONE • Equal dispatch for strict stability in CIOQ-P: DONE • Equal dispatch plus decomposition for strict stability in G-MSM: DONE • Rate shaping plus maximal matching suffices for QoS in CIOQ: DONE • SOQF suffices for long-term fairness in CIOQ: DONE Plus many more to round out the work

  27. Additional Contributions Background: Survey of formal methods in switching– a new perspective Applications Combined I/O Queueing Aggregation • Maximal Matching: Delay analysis • Perfect Sequences: Uniform Traffic • Multicast support using Recycling • Batch Decomposition (Optical) • Support for Heterogeneous Subports Pipelining Parallelization • Concurrent Dispatch: BVN and SPS • SMM Switches: PPS without backpressure • Fractional Dispatch for memoryless inputs

  28. Matching Flavors • Maximal matching: Non-idling, greedy • Maximum-size matching: Maximum flow in a bipartite graph • Ford-Fulkerson, Hopcroft-Karp Invariant: 3 0 6 At least one connection in the marked lines 7 0 1 Queue State Non-empty 0 5 0

  29. Matching Flavors (continued) • Critical Matching: Covers all critical rows and columns • Critical line: A line with the maximum sum • Perfect Matching: Each configuration is a permutation • Maximum Weight Matching: Use queue length as weights • Optimization problem: simplex method • Template Matchings: • BVN: Decompose rate matrix as convex combination of permutations • Double: Lower number of permutations, wasted slots • Min: N permutations will cover all entries, large number of wasted slots • Stable Matching: Gale-Shapely algorithm

  30. Stability Theory • Lyapunov functions: Kumar-Meyn ‘95 • Mechanism to extend Foster’s criterion to a system of queues • Weighted cartesian product of queue lengths • Symmetric and co-positive • Fluid limits: Dai-Prabhakar ‘00 • Function of discrete time: Interpolate • Limit: Scale time to infinity • The scaling parameter may be drawn from an increasing sequence rn F(t) = lim 1/r f(rt) r∞

  31. CIOQ: Bandwidth Trunks Arrivals into GQ: Bounded admissible Bandwidth Trunk: Timescale = 1/GCD(R) Covers all entries in GQ before next batch • Delay comparable to BVN rate decomposition

  32. CIOQ: Perfect Sequences • Sub-maximal Perfect Sequence: • A sequence of N permutations that covers the unit matrix • A repeating sequence guarantees 1/N to each pair • Suffices for 100% throughput to uniform traffic • Simple implementation: Staggered round-robin • Not even maximal! Concurrent SPS for CIOQ-P: K turns in KN slots Basis for iSLIP Basis for Atlanta arbitration

  33. Hierarchical Scheduling

  34. CIOQ-P: Equal Dispatch Explicitly equalize the load for each input-output pair Implemented as counters No mis-sequencing issues

  35. CIOQ-P: 3D Maximal Matching Concurrent traversal of queue state matrix Pointers do not coincide with each other

  36. Recursive G-MSM Any matching SPS SPS Memory element of a G-MSM: Replace with a CIOQ switch Virtual Element Queues Organized per space element

  37. PPS: Data Path

More Related