250 likes | 387 Views
ECE 526 – Network Processing Systems Design. IXP XScale and Microengines Chapter 18 & 19: D. E. Comer. Overview. Recalled Packet processing functions (forwarding, queuing…) Traditional network processing systems (CPU + NICs) General network processor architecture and tradeoffs
E N D
ECE 526 – Network Processing Systems Design IXP XScale and Microengines Chapter 18 & 19: D. E. Comer
Overview • Recalled • Packet processing functions (forwarding, queuing…) • Traditional network processing systems (CPU + NICs) • General network processor architecture and tradeoffs • Intel IXP network processors overall architecture • Focus on individual components of Intel IXP chip • Control processor (slow path): XScale core • Overall architecture • Typical functions • Processor features • Packet processing processor (fast path): Microengines • Architecture and features • Differences to conventional processors • Pipelining and multi-threading ECE 526
Purpose of Control Processor • Functions typically executed by embedded control proc: • Bootstrapping • Exception handling • Higher-layer protocol processing • Interactive debugging • Diagnostics and logging • Memory allocation • Application programs (if needed) • User interface and/or interface to the GPP • Control of packet processors • Other administrative functions ECE 526
XScale Memory Architecture • Memory architecture • Uses 32-bit linear address space • configurable endian mode • Byte addressable • Memory Mapping • Allocation of address space (2^32) to different system components • Accesses to memory is translated into access to component • Needs to be carefully crafted • XScale assumes byte addressable memory • Underlying memory uses different size (SDRAM) • How does this work? • Support for Virtual Memory • For demand paging to secondary storage ECE 526
Shared Memory Address Issues • Memory is shared between XScale and Microengines • Same data, but different addresses • What impact does this have? • Pointers need to be translated • Data structures with pointers can not be shared ECE 526
Microengines • Microengines are data-path packet processors IXP • IXP 2400 have 8 Microengines • Simpler than XScale • Low level device as a micro-sequencer • Optimized for packet processing • More complex to use • Often abbreviated as uE ECE 526
uE Functions • uEs handle ingress and egress packet processing: • Packet ingress from physical layer hardware • Checksum verification • Header processing and classification • Packet buffering in memory • Table lookup and forwarding • Header modification • Checksum computation • Packet egress to physical layer hardware ECE 526
uE Architecture • uE characteristics: • Programmable microcontroller • RISC design • 256 general-purpose registers • 512 transfer registers • 128 next neighbor registers • Hardware support for 8 threads and context switching • 640 words of local memory • Control of an Arithmetic and Logic Unit • Direct access to various functional units • A unit to compute a Cyclic Redundancy Check (CRC) ECE 526
uE as Micro-sequencer • Micro-sequencer does not contain native instructions for possible operations • Instead of using instructions, uE invokes functional units to perform operations • Control unit is much “simpler” • Example 1: • uE does not have ADD R2,R3 instruction • Instead: ALU ADD R2, R3 • “ALU” indicates that ALU should be used • “ADD” is a parameter to ALU • Example 2: • Memory access not by simple LOAD R2, 0xdeadbeef • Instead: SRAM LOAD R2, 0xdeadbeef • Altogether similar to normal processor, but more basic ECE 526
uE Instruction Set • General • ALU and etc • Brach and Jump • BR: branch unconditionally • CAM • CAM_CLEAR: clear all entries in local memories • I/O and context swap • SCRATCH (read and write) • For detail see Figure 19.1, 19.2, Comer. ECE 526
uE Memories • uEs: viewing memories differently than XScale does • Does not map memories and I/O devices into a liner address space • Does not view memories as a seamless, uniform repository • uE ISA: requiring a separate instruction for each type of memory and I/O device • SRAM[read, $$x, address1, address2…] • Programmer: required binding of data items to specific type of memory permanently. ECE 526
Execution Pipeline • What is pipeline? • Why pipeline is employed? • One instruction is executed per cycle if pipeline is proper designed • uEs use five-stage or six-stage pipeline: ECE 526
Pipelining ECE 526
Pipelining Problems • Possible sources of pipelining problems • Data dependencies • Control dependencies • Resource dependencies • Memory accesses • How pipelining problem impact system performance • How these impact can be removed or reduced • Remove the sources so that no stall happened • Hide the impact of pipelining stall ECE 526
Pipeline Stalls • K: ALU ADD R2, R1, R2 • K+1 ALU ADD R3, R2, R3 • Control dependencies, memory have even bigger impact ECE 526
Threading Illustration ECE 526
Hardware Threads • uEs support 8 hardware thread contexts • One thread can execute at any given time • When stall occurs, uE can switch to other thread (if not stalled) • Very low overhead for context switch • “Zero-cycle context switch” • Effectively can take around three cycles due to pipeline flush • Switching rules • If thread stalls, check if next is ready for processing • Keep trying until ready thread is found • If none is available, stall uE and wait for any thread to unblock • Improves overall throughput • Questions: • Why not 16, 32 threads • why not have 48 uEs with 1 thread? ECE 526
Summary • Control processor (slow path): XScale core • Overall architecture • Typical functions • Processor features • Packet processing processor (fast path): Microengines • Architecture and features • Differences to conventional processors • Pipelining and multi-threading ECE 526
Lab3 Brief • Intel Reference Systems • SDK Tutorial • Lab 3 ECE 526
Intel Reference Systems • Hardware Testbed • IXP2400 network processors • QDRM-SRAM, Flash ROM and other memories • 1G optical ethernet ports • 100M ethernet management port • Serial interface • PCI interfaces • SDK (software development kit) • Compiler • Assembler, linker • Simulator • Reference codes ECE 526
Lab3: Forwarding, Counting & Classification • Goal: to explore the basic functionalities of the IXP2400 software development kit and Microengines. • 3 parts: • Part I: collecting a number of workload statistics from the IXP SDK simulator. Follow steps of lab instruction. • Part II: adding one counting block to count the number of packets. • Part III: implementing a simple packet classification mechanism. • Tools: All three parts require access to a machine that has the Intel SDK installed. If you want, you can also request an installation CD for your own machine, check with TA. ECE 526
Part I: Forwarding Simulation • run an implementation of IP forwarding on the IXP2400 simulator. All the code is provided to you. • collect a set of workload statistics that are reported by the simulator. ECE 526
Part II: Forwarding and Counting • modify above applications by adding counter block • store how many packets are received. ECE 526
Part III: Classification and Counting • classifying packets based on the packet header information. There are four types of traffic that are considered in this lab: • Web traffic over TCP over IPv4 • Non-Web traffic over TCP over IPv4 • UDP over IPv4 • IPv6 • modifying the code to report the number of packets in each type. ECE 526
How to do Lab3 • Windows machine with SDK installed • Download lab instructions and source code from blackboard • Start early. • Very exciting lab. • Due day • Part I and Part II 10/13 • Part III 10/20 ECE 526