740 likes | 754 Views
QoS in an Ethernet world. Bill Lynch Founder & CTO. Why is it needed? (Or is it?) What does it do? (Or not do?) Gotchas…. Why is it hard to deploy?. QoS. Headend. VPN A. CE. VPN B. CE. CE. VPN A. Headend. Computational Particle Physicist. Triple play data networks.
E N D
QoS in an Ethernet world Bill LynchFounder & CTO
Why is it needed? (Or is it?) What does it do? (Or not do?) Gotchas…. Why is it hard to deploy? QoS
Headend VPN A CE VPN B CE CE VPN A Headend Computational Particle Physicist Triple play data networks VOD, CONF, Data services Interface content mirroring for security requirements • High-speed Ethernet Edge • Assured QoS • DOS prevention Edge PE IP or MPLS or λ Core Distribution PE Broadband Home Centralized Headend • Video, voice, data over ethernet. • QoS across thousands of subscribers • SLAs and differential pricing
Voice Many connections Low BW/connection Latency/jitter requirements Video Few sources Higher BW Latency Data Many connection Unpredictable BW BE generally okay Computational particle physicist Very high peak BW & duration Very few connections Triple play data characteristics
Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Router QoS
Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Router QoS QoS == which packet goes first Only matters under congestion
Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Router QoS Inherent packet jitter Worse: N simultaneous arrivals Bad: Per hop! Worse: Bigger MTU
Inherent jitter (per hop!) Fundamental conclusion: QoS more important at edge Edge also more likely to congest FE GE OC-12 OC-12 OC-192
Gotchas…. • Already no guarantees from simultaneous arrival… … but hope the total worst case is < 10ms? • And what if your router wasn’t perfect?
Physical Port Physical Port HI Queue HI Queue Physical Port Physical Port Physical Port Physical Port LO Queue LO Queue Physical Port Physical Port What is Queue Sharing? Queue Sharing is when multiple physical or switch fabric connections must share queues. Example: Each input linecard has two queues for each output linecard. All packets in a shared queue are treated equally.
Physical Port HI Queue HI Queue Physical Port Physical Port LO Queue LO Queue Physical Port What is Head of Line Blocking? When an output linecard becomes congested, traffic becomes congested on the input linecard Traffic control (W/RED) must be performed at input VOQ.
Physical Port HI Queue HI Queue Physical Port Physical Port LO Queue LO Queue Physical Port What is Head of Line Blocking? The output linecard cannot process all of the output traffic. Because all traffic in a shared queue (VOQ) is treated equally, we have affected traffic on the uncongested port.
Queue Sharing Test Results Congested port (Flows C, D, E) remained at 100% throughput Uncongested (Flows A, B) were penalized because of Queue Sharing
The effects of Queue Sharing With the presence of Queue Sharing, congestion can severely affect the performance of non-congested ports Congestion is caused by: Topology Changes Routing Instability Denial of Service Attacks High Service Demand Misconfiguration of systems or devices
Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Output Queued Architectures - PRO/8000 Only one queuing location exists in the entire system 36,000 unique hardware queues Protected bandwidth on a queue Incoming packets are immediately placed into a unique output queue Centralized Shared Memory Switch Fabric
Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Output Queued Architectures - PRO/8000 Only one queuing location exists in the entire system Over 36,000 unique hardware queues Bandwidth is protected on a per-queue basis Incoming packets are immediately placed into a unique output queue Centralized Shared Memory Switch Fabric
Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Output Queued Architectures - PRO/8000 Traffic control (W/RED) is performed on each output queue individually Protected bandwidth for every single queue Centralized Shared Memory Switch Fabric
Pro/8812 Test Results Congested port (Flows C, D, E) remained at 100% throughput Uncongested (Flows A, B) remained at 100% throughput
Voice Many connections Low BW/connection Latency/jitter requirements Video Few sources Higher BW Latency Data Many connection Unpredictable BW BE generally okay Computational particle physicist Very high peak BW & duration Very few connections Triple play data characteristics
Political Peers Equipment QoS is end to end Many queues/port Many shapers/port Fast diffserv/remarking Computation expense Operational Must deploy everywhere Must police at the edge Commercial Easier short term solutions to problems Cheaper alternatives Applications Not tuned or aware QoS not ‘required’ for the application Geographical Last mile technologies Single provider network Green field deployments QoS Deployment Issues
Summary • Triple play requires QoS • Services drive quality • Most routers aren’t perfect • Shared queues mean you can’t provision a port independently • Political and deployment problems remain • Some geographic areas better suited
SC LCU 297 sq mm (17.26mm x 17.26mm) 30.5M transistors 47M contacts 50KBytes of memory 425 sq mm (20.17mm x 21.07mm) 137M transistors 188M contacts 950KBytes of memory NPU Striper 429 sq mm (20.17mm x 21.29mm) 156M transistors 265M contacts 1.2MBytes of memory 429 sq mm (20.17mm x 21.29mm) 214M transistors 400M contacts 2.6MBytes of memory GA MCU 389 sq mm (19.05mm x 20.4mm) 106M transistors 188M contacts 1.2MBytes of memory 225 sq mm (15.02mm x 15.02mm) 83M transistors 136M contacts 900KBytes of memory Never underestimate the power of Moore’s Law Architecture
NPU – 40G QoS lookups FTSRAM LxU • VLIW systolic Array • Packet advances every cycle • Named bypassing • > 200 processors • 4 ops/cycle/processor • 12 loads every cycle • (1Tb memory BW) • 36 loads/packet PxU IPA PBU pacman QxU
NPU FTSRAM LxU • VLIW systolic Array • Normal instruction set • Arithmetic • Logical • Branch • Load • Simple programming model • Deterministic performance PxU IPA PBU pacman QxU
Memory Controller – Service Level Queueing • High BW • 16 DRAM chips • independent memory banks • BW dist. across banks • 36K queues • Memory management • Write-once multicast • Preserve ordering
Basic Router Architecture Elements Linecard Switch Fabric Linecard Three Classes of Switch Fabric Architecture - Input Queued (IQ) - Output Queued (OQ) - Combined Input/Output Queued (CIOQ)
Input Queued (IQ) Fabrics Input Linecard Switch Fabric Ouput Linecard Input Queued Switch Fabrics: Inefficient use of memory Require Complex Scheduling
Combined Input/Output Queued (CIOQ) Fabrics Input Linecard Switch Fabric Ouput Linecard CIOQ Switch Fabrics: Generally with point-to-point fabric in the middle (Crossbar, multi-stage (clos), torus) Requires Complex Scheduling Queues shared to reduce complexity
Output Queued Fabrics Input Linecard Switch Fabric Ouput Linecard OQ Switch Fabrics: Require extremely high speed memory access Do not share queues Efficient multicast replication Protected bandwidth per queue
Terabit Centralized Shared Memory Routers April 20, 2004 Bill Lynch CTO
Whither QoS? April 20, 2004 Bill Lynch CTO
Headend VPN A CE VPN B CE CE VPN A Headend Research, Education, Grid, Supercomputing Concurrent Services VOD, CONF, Data services Interface content mirroring for security requirements • High-speed Ethernet Edge • Assured QoS • DOS prevention Edge PE IP MPLS λ Distribution PE Broadband Home Centralized Headend • Video, voice, data over ethernet. • QoS across thousands of subscribers • SLAs and differential pricing
(More Bill’s Slides Here) • (As much detail on the switch fabric and chips as you are comfortable saying in a multi-vendor environment!) • No scheduling • 36K service level queues • NPU for fast lookup, policing, shaping • SW abstraction based on service performed, not provided knobs • Many, many, many DRAM banks. However, ½ as many as CIOQ architectures. • 40G NPU for line rate • Policing • Remarking • DA, AS, other lookup • SW interface focus on service, not knobs.
(Insert Bill’s Slides Here) • Self Introduction • Problem Statement (Bill) • "Layer 3 QoS at the right scale price is elusive"Throwing more bandwidth at lower layers only makes networking researchers commodity bandwidth brokers. Also that is fine for R&E but commercially that is too expensive, so there appears to be a growing disconnect between R&E and commercial.It will be important not to slam the current L2/L1 vogue lest we upset the locals :) • Numerous commercial implementations starting now • Single network country • High BW to home • Triple play • Assertion (Bill) • "System Architecture greatly contributes to the proper operation of network wide QoS"Current system architecture are completely unfocused on network wide QoS, and focused on per-hop-behaviors. This forces networkers to tweak 100 knobs to get the desired behavior. Why not architect the system to protect a flow through the router, so that behaviors are predictable in every circumstance? • End 2 end. Any problem exacerbated by TCP.
Abilene Network Map Source: http://abilene.internet2.edu/new/upgrade.html
Internet Growth Predictions “117% YEARLY GROWTH THROUGH 2006” “VIDEO WILL DRIVE TRAFFIC GROWTH OVER THE NEXT 10 YEARS” Source: Yankee Group April 2004
Network Reference Design Single Element Core (Cluster) Interdomain QoS Peers Concurrent Services Edge Intradomain QoS
PRO/8000TM Concurrent Services Routers • Highest performance and density • 960Gbps 2 per rack Ultra-compact 80Gbps 8 per rack
PRO/8000 Series Logical Architecture Procket VLSI Forwarding Plane # CP Route Processors (1+1) CP Control Plane 1 5 CP 2 4 1 5 3 1 5 Line Card Line Card Switch Cards (2+1) 1 5 Media Adapters Media Adapters • Fully redundant Switch Cardsand Route Processors • All components hot-swappablein-service • No single point of failure • Strictly non-blocking
Basic Router Architecture Elements Linecard Switch Fabric Linecard Three Classes of Switch Fabric Architecture - Input Queued (IQ) - Output Queued (OQ) - Combined Input/Output Queued (CIOQ)
Input Queued (IQ) Fabrics Input Linecard Switch Fabric Ouput Linecard Input Queued Switch Fabrics: Inefficient use of memory Require Complex Scheduling
Combined Input/Output Queued (CIOQ) Fabrics Input Linecard Switch Fabric Ouput Linecard CIOQ Switch Fabrics: Generally with point-to-point fabric in the middle (Crossbar, multi-stage (clos), torus) Requires Complex Scheduling Queues shared to reduce complexity
Output Queued Fabrics Input Linecard Switch Fabric Ouput Linecard OQ Switch Fabrics: Require extremely high speed memory access Do not share queues Efficient multicast replication Protected bandwidth per queue
Physical Port Physical Port HI Queue HI Queue Physical Port Physical Port Physical Port Physical Port LO Queue LO Queue Physical Port Physical Port What is Queue Sharing? Queue Sharing is when multiple physical or switch fabric connections must share queues. Example: Each input linecard has two queues for each output linecard. All packets in a shared queue are treated equally.
Physical Port HI Queue HI Queue Physical Port Physical Port LO Queue LO Queue Physical Port What is Head of Line Blocking? When an output linecard becomes congested, traffic becomes congested on the input linecard Traffic control (W/RED) must be performed at input VOQ.
Physical Port HI Queue HI Queue Physical Port Physical Port LO Queue LO Queue Physical Port What is Head of Line Blocking? The output linecard cannot process all of the output traffic. Because all traffic in a shared queue (VOQ) is treated equally, we have affected traffic on the uncongested port.
Queue Sharing Test Results Congested port (Flows C, D, E) remained at 100% throughput Uncongested (Flows A, B) were penalized because of Queue Sharing Traffic on adjacent ports was dropped!