Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Department of Computer and IT Engineering University of Kurdistan Computer Networks II Router Architecture By: Dr. Alireza Abdollahpouri

What is Routing and forwarding? R3 R1 R4 D A B E R2 C R5 F 2

Introduction History … 3

Introduction History … And future trends! 4

What a Router Looks Like Cisco GSR 12416 Juniper M160 19” 19” Capacity: 80Gb/sPower: 2.6kW Capacity: 160Gb/sPower: 4.2kW 6ft 3ft 2.5ft 2ft 5

Basic network system functionality Address lookup Packet forwarding and routing Fragmentation and re-assembly Security Queuing Scheduling Packet classification Traffic measurement … Packet Processing Functions 6

Per-packet Processing in a Router 1. Accept packet arriving on an ingress line. 2. Lookup packet destination address in the forwarding table, to identify outgoing interface(s). 3. Manipulate packet header: e.g., decrement TTL, update header checksum. 4. Send packet to outgoing interface(s). 5. Queue until line is free. 6. Transmit packet onto outgoing line. 7

Basic Architecture of a Router How routing protocols establish routes/etc Routing - Routing table update (OSPF, RIP, IS-IS) - Admission Control - Congestion Control - Reservation Control Plane May be Slow “Typically in Software” • How packets get forwarded • Routing • Lookup • Packet • Classifier • Switching • Arbitration • Scheduling Data plane (per-packet processing) Must be fast “Typically in Hardware” Switching 8

9 Data Data Data Hdr Hdr Hdr Header Processing Header Processing Header Processing Lookup IP Address Lookup IP Address Lookup IP Address Update Header Update Header Update Header Address Table Address Table Address Table Data Data Hdr Hdr Data Hdr Generic Router Architecture Buffer Manager Buffer Memory Buffer Manager Buffer Memory Buffer Manager Buffer Memory

Control path Data path Scheduling path Functions in a Packet Switch Interconnect Egress linecard Ingress linecard Route lookup Buffer ing Framing TTL process ing Framing Buffer ing QoS schedul ing Interconnect scheduling Control plane usually multiple usage of memory (DRAM for packet buffer, SRAM for queues and tables) 10

Line Card Picture 11

Major Components of Routers: Interconnect Memory Shared Memory Bus Crossbar Interconnect Input Ports to Output Ports, includes 3 modes • Bus • All Input ports transfer data through the shared bus. • Problem : Often cause in data flow congestion. • Shared Memory • Input port write data into the share memory. After destination lookup is performed, the output port read data from the memory. • Problem : Require fast memory read/write and management technology. • Crossbar • N input ports has dedicated data path to N output ports. Result in N*N switching matrix. • Problem : Blocking (Input, Output, Head-of-line HOL). Max switch load for random traffic is about 59%. 12

Interconnects: Two basic techniques Input Queueing Output Queueing Usually a non-blocking switch fabric (e.g. crossbar) 13

How an OQ Switch Works Output Queued (OQ) Switch 14

Input Queueing: Head of Line Blocking Delay Load 100% 58.6% 15

Head of Line Blocking 16

Virtual Output Queues (VoQ) • Virtual Output Queues: • At each input port, there are N queues – each associated with an output port • Only one packet can go from an input port at a time • Only one packet can be received by an output port at a time • It retains the scalability of FIFO input-queued switches • It eliminates the HoL problem with FIFO input Queues 19

Input Queueing: Virtual output queues 20

Input Queueing: Virtual output queues Delay Load 100% 21

The Evolution of Router Architecture Modern Routers First Generation Routers 22

CPU Buffer Memory Route Table CPU Line Interface Line Interface Line Interface Memory MAC MAC MAC First Generation Routers Shared Backplane Line Interface Bus-based Router Architectures with Single Processor 23

First Generation Routers • Based on software implementations on a single CPU. • Limitations: • Serious processing bottleneck in the central processor • Memory intensive operations (e.g. table lookup & data movements) limits the effectiveness of processor power 24

Fwding Cache Second Generation Routers CPU Buffer Memory Route Table Bus-based Router Architectures with Multiple Processors Line Card Line Card Line Card Buffer Memory Buffer Memory Buffer Memory Fwding Cache Fwding Cache MAC MAC MAC 25

Second Generation Routers • Architectures with Route Caching • Distribute packet forwarding operations • Network interface cards • Processors • Route caches • Packets are transmitted once over the shared bus • Limitations: • The central routing table is a bottleneck at high-speeds • Traffic dependent throughput (cache) • Shared bus is still a bottleneck 26

Fwding Table Third Generation Routers Switched Backplane Line Card CPU Card Line Card Line Interface CPU Local Buffer Memory Local Buffer Memory Routing Table Memory Fwding Table MAC MAC Switch-based Architectures with Fully Distributed Processors 27

Third Generation Routers • To avoid bottlenecks: • Processing power • Memory bandwidth • Internal bus bandwidth • Each network interface is equipped with appropriate processing power and buffer space. • Data vs. control plane • Data plane – line cards • Control plane - processor 28

Fourth Generation Routers/Switches Optics inside a router for the first time Optical links 100s of metres Switch Core Linecards 0.3 - 10Tb/s routers in development 29

Demand for More Powerful Routers Do we still higher processing power in networking devices? Of course, YES But why? and how? 30

Demands for Faster Routers (why?) Beyond the moore’s law 31

Future applications will demand TIPS • Demands for Faster Routers (why?) 32

Future applications will demand TIPS Power? Heat? • Demands for Faster Routers (why?) 33

Demands for Faster Routers (summary) • Technology push: • - Link bandwidth scaling much faster than CPU and memory • technology • - Transistor scaling and VLSI technology help but not enough • Application pull: • - More complex applications are required • - Processing complexity is defined as the number of instructions • and number of memory access to process one packet 34

“Future applications will demand TIPS” “Think platform beyond a single processor” “Exploit concurrency at multiple levels” “Power will be the limiter due to complexity and leakage” Distribute workload on multiple cores • Demands for faster routers (How?) 35

Symmetric multi-processors allow multi-threaded applications to achieve higher performance at less die area and power consumption than single-core processors Asymmetric multi-processors consume power and provide increased computational power only on demand Multi-Core Processors 36

Performance Bottlenecks • Memory • Bandwidth available, but access time too slow • Increasing delay for off-chip memory • I/O • High-speed interfaces available • Cost problem with optical interfaces • Internal Bus Can be solved with an effective switch, allowing simultaneous transfers between network interfaces • Processing power • Individual cores are getting more complex • Problems with access to shared resources • Control processor can become bottleneck 37

Different Solutions Flexibility GPP • ASIC • FPGA • NP • GPP NP FPGA ASIC Performance 38

Different Solutions By: Niraj Shah 39

“It is always something (corollary). Good, Fast, Cheap: Pick any two (you can’t have all three).” RFC1925 “The Twelve Networking Truths” 40

High cost to develop Network processing moderate quantity market Long time to market Network processing quickly changing services Difficult to simulate Complex protocol Expensive and time-consuming to change Little reuse across products Limited reuse across versions No consensus on framework or supporting chips Requires expertise Why not ASIC? 41

Network Processors • Introduced several years ago (1999+) • A way to introduce flexibility and programmability in network processing • Many players were there (Intel, Motorola, IBM) • Only a few players still there 42

Intel IXP 2800 Initial release August 2003 43

What Was Correct With NPs? • CPU-level flexibility • – A giant step forward compared to ASICs • How? • – Hardware coprocessors • – Memory hierarchies • – Multiple hardware threads (zero context switching overhead) • – Narrow (and multiple) memory buses • – Some other ad-hoc solutions for network processing, e.g., Fast • switching fabric, memory accesses, etc 44

What Was Wrong With NPs? Programmability issues – Completely new programming paradigm – Developers are not familiar with the unprecedented parallelism of the NPU, They do not know how to exploit it at best – New (proprietary) languages – Portability among different network processors families 45

What Happened in NP Market? • Intel went out of the market in 2007 • Many other small players disappeared • High risk when selecting a NP maker that may disappear 46

Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. RFC1925 “The Twelve Networking Truths” 47

Software Routers • Processing in General-purpose CPUs • CPUs optimized for few threads, high performance per thread • – High CPU frequencies • – Maximize instruction-level parallelism • • Pipeline • • Superscalar • • Out-of-order execution • • Branch prediction • • Speculative loads 48

Software Routers • Aim: Low cost, flexibility and extensibility • Linux on PC with a bunch of NICs • Changing a functionality is as simple as a software upgrade 49

Software Routers (examples) • RouteBricks [SOSP’09] Uses Intel Nehalem architecture • Packet shader [SIGCOMM’10] GPU-Accelerated Developed in KAIST, Korea 50

Department of Computer and IT Engineering University of Kurdistan Computer Networks II