QoS in an Ethernet world

QoS in an Ethernet world Bill LynchFounder & CTO

Why is it needed? (Or is it?) What does it do? (Or not do?) Gotchas…. Why is it hard to deploy? QoS

Headend VPN A CE VPN B CE CE VPN A Headend Computational Particle Physicist Triple play data networks VOD, CONF, Data services Interface content mirroring for security requirements • High-speed Ethernet Edge • Assured QoS • DOS prevention Edge PE IP or MPLS or λ Core Distribution PE Broadband Home Centralized Headend • Video, voice, data over ethernet. • QoS across thousands of subscribers • SLAs and differential pricing

Voice Many connections Low BW/connection Latency/jitter requirements Video Few sources Higher BW Latency Data Many connection Unpredictable BW BE generally okay Computational particle physicist Very high peak BW & duration Very few connections Triple play data characteristics

Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Router QoS

Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Router QoS QoS == which packet goes first Only matters under congestion

Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Router QoS Inherent packet jitter Worse: N simultaneous arrivals Bad: Per hop! Worse: Bigger MTU

Inherent jitter (per hop!) Fundamental conclusion: QoS more important at edge Edge also more likely to congest FE GE OC-12 OC-12 OC-192

Gotchas…. • Already no guarantees from simultaneous arrival… … but hope the total worst case is < 10ms? • And what if your router wasn’t perfect?

Physical Port Physical Port HI Queue HI Queue Physical Port Physical Port Physical Port Physical Port LO Queue LO Queue Physical Port Physical Port What is Queue Sharing? Queue Sharing is when multiple physical or switch fabric connections must share queues. Example: Each input linecard has two queues for each output linecard. All packets in a shared queue are treated equally.

Physical Port HI Queue HI Queue Physical Port Physical Port LO Queue LO Queue Physical Port What is Head of Line Blocking? When an output linecard becomes congested, traffic becomes congested on the input linecard Traffic control (W/RED) must be performed at input VOQ.

Physical Port HI Queue HI Queue Physical Port Physical Port LO Queue LO Queue Physical Port What is Head of Line Blocking? The output linecard cannot process all of the output traffic. Because all traffic in a shared queue (VOQ) is treated equally, we have affected traffic on the uncongested port.

Queue Sharing Test Results Congested port (Flows C, D, E) remained at 100% throughput Uncongested (Flows A, B) were penalized because of Queue Sharing

The effects of Queue Sharing With the presence of Queue Sharing, congestion can severely affect the performance of non-congested ports Congestion is caused by: Topology Changes Routing Instability Denial of Service Attacks High Service Demand Misconfiguration of systems or devices

Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Output Queued Architectures - PRO/8000 Only one queuing location exists in the entire system 36,000 unique hardware queues Protected bandwidth on a queue Incoming packets are immediately placed into a unique output queue Centralized Shared Memory Switch Fabric

Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Output Queued Architectures - PRO/8000 Only one queuing location exists in the entire system Over 36,000 unique hardware queues Bandwidth is protected on a per-queue basis Incoming packets are immediately placed into a unique output queue Centralized Shared Memory Switch Fabric

Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Physical Port Output Queued Architectures - PRO/8000 Traffic control (W/RED) is performed on each output queue individually Protected bandwidth for every single queue Centralized Shared Memory Switch Fabric

Pro/8812 Test Results Congested port (Flows C, D, E) remained at 100% throughput Uncongested (Flows A, B) remained at 100% throughput

Voice Many connections Low BW/connection Latency/jitter requirements Video Few sources Higher BW Latency Data Many connection Unpredictable BW BE generally okay Computational particle physicist Very high peak BW & duration Very few connections Triple play data characteristics

Network Qos architectures 20

Political Peers Equipment QoS is end to end Many queues/port Many shapers/port Fast diffserv/remarking Computation expense Operational Must deploy everywhere Must police at the edge Commercial Easier short term solutions to problems Cheaper alternatives Applications Not tuned or aware QoS not ‘required’ for the application Geographical Last mile technologies Single provider network Green field deployments QoS Deployment Issues

Summary • Triple play requires QoS • Services drive quality • Most routers aren’t perfect • Shared queues mean you can’t provision a port independently • Political and deployment problems remain • Some geographic areas better suited

SC LCU 297 sq mm (17.26mm x 17.26mm) 30.5M transistors 47M contacts 50KBytes of memory 425 sq mm (20.17mm x 21.07mm) 137M transistors 188M contacts 950KBytes of memory NPU Striper 429 sq mm (20.17mm x 21.29mm) 156M transistors 265M contacts 1.2MBytes of memory 429 sq mm (20.17mm x 21.29mm) 214M transistors 400M contacts 2.6MBytes of memory GA MCU 389 sq mm (19.05mm x 20.4mm) 106M transistors 188M contacts 1.2MBytes of memory 225 sq mm (15.02mm x 15.02mm) 83M transistors 136M contacts 900KBytes of memory Never underestimate the power of Moore’s Law Architecture

NPU – 40G QoS lookups FTSRAM LxU • VLIW systolic Array • Packet advances every cycle • Named bypassing • > 200 processors • 4 ops/cycle/processor • 12 loads every cycle • (1Tb memory BW) • 36 loads/packet PxU IPA PBU pacman QxU

NPU FTSRAM LxU • VLIW systolic Array • Normal instruction set • Arithmetic • Logical • Branch • Load • Simple programming model • Deterministic performance PxU IPA PBU pacman QxU

Memory Controller – Service Level Queueing • High BW • 16 DRAM chips • independent memory banks • BW dist. across banks • 36K queues • Memory management • Write-once multicast • Preserve ordering

Basic Router Architecture Elements Linecard Switch Fabric Linecard Three Classes of Switch Fabric Architecture - Input Queued (IQ) - Output Queued (OQ) - Combined Input/Output Queued (CIOQ)

Input Queued (IQ) Fabrics Input Linecard Switch Fabric Ouput Linecard Input Queued Switch Fabrics: Inefficient use of memory Require Complex Scheduling

Combined Input/Output Queued (CIOQ) Fabrics Input Linecard Switch Fabric Ouput Linecard CIOQ Switch Fabrics: Generally with point-to-point fabric in the middle (Crossbar, multi-stage (clos), torus) Requires Complex Scheduling Queues shared to reduce complexity

Output Queued Fabrics Input Linecard Switch Fabric Ouput Linecard OQ Switch Fabrics: Require extremely high speed memory access Do not share queues Efficient multicast replication Protected bandwidth per queue

Terabit Centralized Shared Memory Routers April 20, 2004 Bill Lynch CTO

Whither QoS? April 20, 2004 Bill Lynch CTO

Headend VPN A CE VPN B CE CE VPN A Headend Research, Education, Grid, Supercomputing Concurrent Services VOD, CONF, Data services Interface content mirroring for security requirements • High-speed Ethernet Edge • Assured QoS • DOS prevention Edge PE IP MPLS λ Distribution PE Broadband Home Centralized Headend • Video, voice, data over ethernet. • QoS across thousands of subscribers • SLAs and differential pricing

(More Bill’s Slides Here) • (As much detail on the switch fabric and chips as you are comfortable saying in a multi-vendor environment!) • No scheduling • 36K service level queues • NPU for fast lookup, policing, shaping • SW abstraction based on service performed, not provided knobs • Many, many, many DRAM banks. However, ½ as many as CIOQ architectures. • 40G NPU for line rate • Policing • Remarking • DA, AS, other lookup • SW interface focus on service, not knobs.

(Insert Bill’s Slides Here) • Self Introduction • Problem Statement (Bill) • "Layer 3 QoS at the right scale price is elusive"Throwing more bandwidth at lower layers only makes networking researchers commodity bandwidth brokers. Also that is fine for R&E but commercially that is too expensive, so there appears to be a growing disconnect between R&E and commercial.It will be important not to slam the current L2/L1 vogue lest we upset the locals :) • Numerous commercial implementations starting now • Single network country • High BW to home • Triple play • Assertion (Bill) • "System Architecture greatly contributes to the proper operation of network wide QoS"Current system architecture are completely unfocused on network wide QoS, and focused on per-hop-behaviors. This forces networkers to tweak 100 knobs to get the desired behavior. Why not architect the system to protect a flow through the router, so that behaviors are predictable in every circumstance? • End 2 end. Any problem exacerbated by TCP.

Abilene Network Map Source: http://abilene.internet2.edu/new/upgrade.html

Internet Growth Predictions “117% YEARLY GROWTH THROUGH 2006” “VIDEO WILL DRIVE TRAFFIC GROWTH OVER THE NEXT 10 YEARS” Source: Yankee Group April 2004

Network Reference Design Single Element Core (Cluster) Interdomain QoS Peers Concurrent Services Edge Intradomain QoS

PRO/8000TM Concurrent Services Routers • Highest performance and density • 960Gbps 2 per rack Ultra-compact 80Gbps 8 per rack

PRO/8000 Series Logical Architecture Procket VLSI Forwarding Plane # CP Route Processors (1+1) CP Control Plane 1 5 CP 2 4 1 5 3 1 5 Line Card Line Card Switch Cards (2+1) 1 5 Media Adapters Media Adapters • Fully redundant Switch Cardsand Route Processors • All components hot-swappablein-service • No single point of failure • Strictly non-blocking

Basic Router Architecture Elements Linecard Switch Fabric Linecard Three Classes of Switch Fabric Architecture - Input Queued (IQ) - Output Queued (OQ) - Combined Input/Output Queued (CIOQ)

Input Queued (IQ) Fabrics Input Linecard Switch Fabric Ouput Linecard Input Queued Switch Fabrics: Inefficient use of memory Require Complex Scheduling

Combined Input/Output Queued (CIOQ) Fabrics Input Linecard Switch Fabric Ouput Linecard CIOQ Switch Fabrics: Generally with point-to-point fabric in the middle (Crossbar, multi-stage (clos), torus) Requires Complex Scheduling Queues shared to reduce complexity

Output Queued Fabrics Input Linecard Switch Fabric Ouput Linecard OQ Switch Fabrics: Require extremely high speed memory access Do not share queues Efficient multicast replication Protected bandwidth per queue

Physical Port Physical Port HI Queue HI Queue Physical Port Physical Port Physical Port Physical Port LO Queue LO Queue Physical Port Physical Port What is Queue Sharing? Queue Sharing is when multiple physical or switch fabric connections must share queues. Example: Each input linecard has two queues for each output linecard. All packets in a shared queue are treated equally.

Physical Port HI Queue HI Queue Physical Port Physical Port LO Queue LO Queue Physical Port What is Head of Line Blocking? When an output linecard becomes congested, traffic becomes congested on the input linecard Traffic control (W/RED) must be performed at input VOQ.

Physical Port HI Queue HI Queue Physical Port Physical Port LO Queue LO Queue Physical Port What is Head of Line Blocking? The output linecard cannot process all of the output traffic. Because all traffic in a shared queue (VOQ) is treated equally, we have affected traffic on the uncongested port.

Queue Sharing Test Results Congested port (Flows C, D, E) remained at 100% throughput Uncongested (Flows A, B) were penalized because of Queue Sharing Traffic on adjacent ports was dropped!

QoS in an Ethernet world

QoS in an Ethernet world

Presentation Transcript

Advances in Ethernet

Ethernet, Fast Ethernet, and Gigabit Ethernet

QoS in MPLS

QoS In WLAN

In-car Ethernet

Energy Efficient Ethernet An Overview

QoS in an Ethernet world

In an ideal world...

An QoS ALSP (ULSP) proposal

QoS

Per-Stream QoS and Admission Control in Ethernet Passive Optical Networks (EPONs)

Optimized QoS Protection of Ethernet Trees

QoS in PPPoE

QoS in 802.11

QoS

An OWL Ontology for QoS

An Overview of 3GPP QoS

Per-Stream QoS and Admission Control in Ethernet Passive Optical Networks (EPONs)

QoS research in a complicated world