Spring 2006 EE 5304/EETS 7304 Internet Protocols

Spring 2006 EE 5304/EETS 7304 Internet Protocols Lecture 9 Routers, switches Tom Oh Dept of Electrical Engineering taehwan@engr.smu.edu

First Generation Routers All packets go over shared bus into main memory for centralized processing Fabric = shared bus

First Generation Routers (cont) • Packets are transferred over bus to main memory and CPU for processing, then to output queues • Packet processing in software • Bottlenecks • Mainly centralized packet processing by CPU • Shared bus is inefficient (each packet takes two times on bus) but not slow compared to CPU

Second Generation Routers • Distribute portion of routing table to a cache at each input line card • Cache hits: packet goes through bus directly to output queue • Cache miss: packet goes to CPU for processing as before • Shared bus is still bottleneck

Third Generation Routers Forwarding path works at line speed and separate from control plane Switch fabric is highly parallel to transfer multiple packets simultaneously

Third Generation Routers (cont) • Shared bus replaced by space-division switch fabric for higher throughput • Borrow switching techniques from ATM switching • Complete separation of CPU into routing engine (building routing table and running routing protocol) and packet forwarding engine (packet processing and routing) • Full routing info. in line cards • Faster address lookup algorithms • Application specific integrated circuits (ASICs) for faster packet processing, to work at “line speed”

Fourth Generation Routers • Data plane is optical • Control plane is electronic • Packet headers must be processed electronically • Ultimate goal is all-optical network

ATM Switching Origins • Packet speech research (late 1970s - early 1980s) found packet switching for voice, but only if protocols and switches are modified • X.25 and IP were designed for data and unsuited for packet speech • Fast packet switching: streamlined or "lightweight" protocol for high speed packet processing in hardware • Assumes reliable, high rate digital transmission facilities - then minimal error control necessary • Should be suitable for both real-time traffic (eg, speech) and nonreal-time data

ATM Switching Origins (cont) • Fast packet switching is not a particular protocol, but a set of principles designed to minimize packet delays and maximize switch throughput • Minimal error control (no ACKs or retransmissions) • Error control can be done at higher layers if needed • Connection oriented (virtual circuits) • Main function of packet header is identify VC • Fast packet switches use highly parallel hardware processing to minimize delay and maximize throughput

ATM Switching Origins (cont) • Packets should be much shorter compared to IP or X.25 (eg. max. 144 bytes of info.) • Shorter than normal data packets to minimize packetization delay (time to fill packet with speech data) and queueing delays • Packets can be variable length or fixed length, but fixed length packets simplify switch design and achieve better pipelining performance

Fast Packet Switch - Example Control processor (software): handles connection setups, connection admission control, operations and network management

Fast Packet Switch - Example (cont) Input port processors (hardware): processes incoming packets at line speed, discard packets with header bit errors, looks up virtual circuit numbers in routing table, passes data packets to fabric and control packets to CP

Fast Packet Switch - Example (cont) Switch fabric (hardware): transfers multiple incoming packets in parallel to output ports, may contain buffering to resolve contentions, may handle packets with priorities

Fast Packet Switch - Example (cont) Output port processors (hardware): recomputes packet header fields as needed

Fast Packet Switch - Example (cont) Software handles control and management functions on slower timescale Packet forwarding path is hardware and parallelized

ATM Switching • ATM is result of fast packet switching research, standardized by ITU in 1988 • Short, fixed length cells (packets): 5-byte header + 48-byte payload • ATM switches based on fast packet switching principles • Based on fast packet switching principles • Most switch architectures differ in design of switch fabric

ATM Switching (cont) • Cell header is primarily to identify virtual circuit number (VPI/VCI) at UNI at NNI GFC VPI VPI VPI VCI VPI VCI VCI VCI VCI PT VCI PT CLP CLP HEC HEC 48-byte data 48-byte data 8 bits 8 bits

ATM Switching (cont) Input port processors: receive physical layer signal, extract ATM cells

ATM Switching (cont) Cell processing: discard cells with header bit errors, look up VPI/VCI in routing table, may add “routing tag” to cell

ATM Switching (cont) Switch fabric: routes cells to output queues

ATM Switching (cont) Output queues: buffers cells waiting for transmission, discard cells with CLP=1 if overflow

ATM Switching (cont) Output port processing: physical transmission

ATM Switch Fabrics - Typical Designs • Space division, e.g., banyan network, Batcher-banyan, Starlite • Shared medium, eg, TDM bus • Shared memory • Fully interconnected, eg., bus matrix, knockout switch

Space Division Switches • Banyan networks 4x4 banyan is constructed by interconnecting 2x2 modules 8x8 banyan is constructed by interconnecting 4x4 banyans 8 incoming cells are pipelined together through the fabric stages

Space Division Switches (cont) • Class of multistage interconnection networks (MINs) with self-routing property • Each cell carries the control information for its route to output • Simple, regular structure, simple 2x2 switching elements → easy for hardware implementation • NxN needs only N/2 log N switching elements • All hardware at same speed as port speed • Modular: easy to construct larger fabrics

Space-Division Switches (cont) • Self-routing

Space Division Switches (cont) • Internally blocking: two cells going to different outputs may collide on same internal link • For uniform random traffic, throughput is low (about 0.4 for N=32) • Approaches: internal buffering (doesn’t improve throughput much), internal speedup (about 4 factor to achieve near full utilization?)

Space Division Switches (cont) • Add Batcher sorting network: sorts cells according to destination addresses

Space Division Switches (cont) • Batcher-banyan network is internally nonblocking but doesn’t solve output contention: two cells going to same output port at same time • Approach 1: input buffers allow one cell to each output at a time and queue others • For uniform random traffic, throughput is 0.586 (general result for input buffering) • Cause is HOL (head of line) blocking: output contention lets one cell go to output and another cell must stay in input buffer; this keeps cells in queue behind it from going to a free output port

Space Division Switches (cont) • Approach 2: (Starlite switch) trap conflicting cells (going to same output port) after Batcher sorter and recirculate these cells through a shared buffer to try again through Batcher

Space Division Switches (cont) • Sharing buffer reduces total amount of buffer required • Need to expand Batcher fabric and add trap network • Need to keep track of number of reattempts of each cell to maintain proper cell sequence

Shared Medium Switches • High speed TDM bus

Shared Medium Switches (cont) • High speed TDM bus

Shared Medium Switches (cont) • Cells at N inputs take round robin turns to broadcast on bus • Address filter at each output detects cells addressed to that output port • Output buffer at each output • To be nonblocking, bus must be N times faster than input and output port rates (speedup factor of N) • Address filters and buffers work at bus speed • Simple, modular design, with moderate buffer requirements

Shared Memory Switches

Shared Memory Switches (cont) • Virtual operation

Shared Memory Switches (cont) • Cells at N inputs take round robin turns to be written into central shared memory • Then read out to appropriate output port; read addresses are kept as linked lists • Buffer sharing → minimal amount of buffers required • N is limited by memory access speed (speedup factor of N again)

Fully Interconnected Switches • Bus matrix switch

Fully Interconnected Switches (cont) • Each input cell is broadcast to every output • Each output multiplexes N buffers (one buffer for each input port), each with an address filter • No speedup factor (everything works at speed of input and output port rates) and no blocking • Expensive in buffers and address filters: grow by N2, and each bus has N crosspoints → limit N

Spring 2006 EE 5304/EETS 7304 Internet Protocols