280 likes | 435 Views
ECE 526 – Network Processing Systems Design. Network Processor Tradeoffs and Examples Chapter: D. E. Comer. Outline. Network Processor design tradeoffs Sample Network Processor. NP Architecture. Numerous different design goals Performance Cost Functionality Programmability
E N D
ECE 526 – Network Processing Systems Design Network Processor Tradeoffs and Examples Chapter: D. E. Comer
Outline • Network Processor design tradeoffs • Sample Network Processor ECE 526
NP Architecture • Numerous different design goals • Performance • Cost • Functionality • Programmability • Numerous different system choices • Use of parallelism • Types of memories • Types of interfaces • Etc. • We consider • Design tradeoffs on high level (qualitative tradeoffs) • Commercial Network Processors ECE 526
Processor Topologies • How can processors be arranged on NP? • Consider heterogeneity of processing resources and workload • Multiprocessor • Parallel processors with shared interconnect • Problems? • Pipeline • Multiple processors per data path • Problems? • Data Flow Architecture • Extreme form of pipelining • Problems? • Heterogeneous Architectures ECE 526
Design Tradeoffs (1) • Low development cost vs. performance • ASICs give higher performance, but take time to develop • NPs allow faster development, but might give lower performance • Programmability vs. processing speed • Similar to tradeoff between ASIC and NP • Co-processors pose the same tradeoffs • Complexity of instruction set • Performance: packet rate, data rate, and bursts • Difficult to assess the performance of a system • Even more difficult to compare different systems • Per-interface rate vs. aggregate data rate • NP usually limited to one port ECE 526
Design Tradeoffs (2) • NP speed vs. bandwidth • How much processing power per bandwidth is necessary? • Depends on application complexity • Coprocessor design: look aside vs. flow-through • Look aside: “called” from main processor, need state transfer • Flow-through: all traffic streams through coprocessor • Pipelining: uniform vs. synchronized • Pipeline stages can take different times • Tradeoff between slowing down or synchronization • Explicit parallelism vs. cost and programmability • Hidden parallelism is easier to program • Explicit parallelism is cheaper to implement ECE 526
Design Tradeoffs (3) • Parallelism: scale vs. packet ordering • Why is packet order important? • Giving up packet order constraint gives better throughput • Parallelism: speed vs. stateful classification • Shared state requires synchronization • Limits parallelism • Memory: speed vs. programmability • Different types of memories give performance • Increases difficulty in programming • I/O performance vs. pin count • Packaging can be major cost factor • More pins give higher performance ECE 526
Design Tradeoffs (4) • Programming languages • Ease of programming vs. functionality vs. speed • Multithreading: throughput vs. programmability • Threads improve performance • Threads require more complex programs and synchronization • Traffic management vs. blind forwarding at low cost • Traffic management is desirable but requires processing • Generality vs. specific architecture role • NPs can be specialized for access, edge, core • NPs can be specialized towards certain protocols • Memory type: special-purpose vs. general-purpose • SRAM and DRAM vs. CAM ECE 526
Design Tradeoffs (5) • Backward compatibility vs. architectural advances • On component level: e.g., memories DDR DRAM • On system level: NP needs to fit into overall router system • Parallelism vs. pipelining • Depends on usage of NP • Summary: • Lots of choices • Most decisions require some insight in expected NP usage • Tradeoffs are all qualitative • Lets look at the commercial design ECE 526
Novel Areas of NP Use • TCP/IP offloading on high-performance servers • Security processing: SSL offloading • Storage area networks • Many others: IDSs and etc. ECE 526
Performance Bottlenecks • Memory • Bandwidth available, but access time too slow • Increasing delay for off-chip memory • I/O • High-speed interfaces available • Cost problem with optical interfaces • Otherwise no problem • Processing power • Individual cores are getting more complex • Problems with access to shared resources • Control processor can become bottleneck ECE 526
Limitations on Scalability • What are the limitations on how fast NPs need to get? • Link rates (optical bandwidth limits) • Application complexity (core vs. edge) • What are the limitations on how fast NPs can get? • Parallelism in networks • Power consumption • Chip area ECE 526
Commercial Network Processors • Commercial NPs • Large variety of architectures • Different applications and performance spaces • Lots of implementation details and practical issues • General Themes • Type and number of processors • Homogeneous vs. heterogeneous • Type and size of memories • Internal and External communications channels • Mechanisms of scalability: parallelism and pipelining • Generality vs. specialization ECE 526
Cisco PXF ECE 526
IXP2400 • XScale (ARM compliant) embedded control processor • Instruction and data caches • 8 microengines • 400 or 600 MHz • 8 threads per microengine • Multiple instruction stores with 4k instructions • 256 general purpose registers • 512 transfer registers • 2GB addressable DDR-DRAM memory (19.2 Gbps) • 32MB addressable QDR-SRAM memory (12 Gbps r+w) • 16 words of Next Neighbor Registers • 16kB scratchpad ECE 526
IXP2400 • Interconnects • Coprocessor bus added (incl. access to T-CAM) • Flow control bus for two-chip configurations (e.g., ingress and egress) • Switch Fabrics • No IX bus • Utopia 1, 2, 3 • CSIX-L1 • SPI-3 (POS-PHY 2/3) ECE 526
Two-Chip Configurations • Flow control needed between ingress and • 1Gbps over flow control bus (not shown) ECE 526
IXP2400 Internal Architecture ECE 526
IXP2400 Microengine • Enhancements over IXP1200 microengines: • Multiplier unit • Pseudo-random number generator • CRC calculator • 4 32-bit timers and timer signaling • 16-entry CAM for inter-thread communication • Time stamping unit • Generalized thread signaling • 640 words of local memory • Simultaneous access to packet queues without mutual exclusion • Functional units for ATM segmentation and reassembly • Automated byte-alignment • uE divided into two clusters with independent command and SRAM buses ECE 526
Software • Support for software pipelining • “Reflector Mode Pathways” for communication • Next Neighbor Registers as programming abstraction • SDK 4.0 • Simulator, debugger, profiler, traffic generator • Portable modules • Provides better infrastructure support • C compiler ECE 526
Summary • Network Processor design space is big due to • Varying design goals • Varying implementation choices • Qualitative tradeoffs • Survey commercial NPs • Network processors are getting more features • Main architecture characteristic is still parallelism • Software support is becoming more important ECE 526