220 likes | 358 Views
ECE 526 – Network Processing Systems Design. Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer. NP Architectures. Last class: Key requirement of network processor: flexibility and scalability Optimized instruction set and parallel processing using multiprocessors
E N D
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer
NP Architectures • Last class: • Key requirement of network processor: flexibility and scalability • Optimized instruction set and parallel processing using multiprocessors • This class: • Internal organization of NP: • Computation, storage and communication • Operating support • Content addressable memory (CAM) • NP scaling issues ECE 526
NP Architectures • NP architecture characteristics • Computation • Processor hierarchy • Special-purpose functional units • Storage • Memory hierarchy • Content addressable memory (CAM) • Communication • Internal buses • External interfaces • Operation support • Concurrent/parallel execution support • Programming models • Dispatch mechanisms ECE 526
Processor Functionality ECE 526
Processor Pyramid ECE 526
Packet Flow through Hierarchy • Accommodating tasks of different complexity and frequency • Low level: simple and frequent processing • High level: occasional and complex processing • Computation scaling • Faster processor • More concurrent threads • More processors • More processor types ECE 526
Memory Hierarchy • Different memory technologies used for performance, cost and area • Conventional Approach: • Register + cache + off-chip DRAM • Exploiting locality: temporal and spatial • Optimized for average case • Transparent to programmer • Network Processors: • Register, scratch pad, control store, onboard RAM, CAM/TCAM, SRAM and SDRAM • Specialized for network processing application • Little temporal locality • Explicit to application developer • Different to programming • More control • Memory hierarchy is not “cached” but used explicitly ECE 526
Memory Technology • Characterized by access latency, area • SRAM: 2-10 ns, 4-6 transistors • DRAM: 50-70 ns, 1 or 3 transistors • What data should be store where? • Instruction data • Packets data: header, payload and meta-data • Temporal data: data structure allocated on the stack • Application data: persistent data, e.g., routing table, rule file ECE 526
Memory Size Example Consider a network system that processes IP datagram. Assume the system executes 5,000 instructions per packet, each instruction occupies 4 bytes, 10% of instructions need to access 4-byte value memory, each datagram consists of 1500 bytes, a lookup examines 10 4-byte values on average in an IP routing table, and a datagram arrives and leaves in an Ethernet frame. Compute the total number of memory locations accessed to process on datagram. Assume no memory caching. • Instruction Memory: • Packet Memory: • Application Memory: • Temporary Memory: Total: ECE 526
Memory Scaling • Memory access time: raw access speed • Technology dependent • Important for random access • Memory bandwidth • Important for overall system performance • Scale with • Multiple ports • Multiple banks • Wider bus • Limits by • Pins and package cost ECE 526
Content Addressable Memory • Not using address to locate content • CAM using content as input in a query-style format • Organized as array of slots • Combination of mechanisms • Random access storage • Exact-match pattern search • Rapid search enabled with parallel hardware ECE 526
Lookup using Conventional CAM • Given • Pattern for which to search • Known as key • CAM returns • First slot that match key or • All slots that match key • Algorithm for each slot do { if (key == slot) { declare key matches slot; } else { declare key does not match slot; } } ECE 526
Ternary CAM (TCAM) • Regular CAM • Binary value: 0 and 1 • Requiring key to match all the content in one slot • Not flexible • TCAM • Ternary value: 0, 1 and don’t care • Implemented using masking of entries • Good for network processor flow classification ECE 526
TCAM Lookup • Each slot has bit mask • Hardware uses mask to decide which bits to test • Algorithm for each slot do { if (key & mask ) == (slot & mask)) { declare key matches slot; } else { declare key does not match slot; } } ECE 526
Partial Matching using TCAM • Key matched slot 1 • Packet belonging to flow ID: 00.02 • Here “additional information” stored in each slot ECE 526
Classification using TCAM • Flexibility: “additional information” stored in separate memory • Extracting values from fields in headers • Forming values in contiguous string • Using a key for TCAM lookup • Storing classification in slot ECE 526
Communication • Internal interfaces: channels between processing elements, memories • Internal bus • Hardware FIFO: sequential access • Transfer register: random access • Onboard shared memory: shared random access • External interfaces • Memory interfaces: accesses to larger off-chip memory • Direct I/O interfaces: e.g., access to link interfaces • Bus interfaces: accesses to other devices, e.g., control CPU • Switching fabric interface • Access to switching fabric • Several standards (e.g., CSIX by NP Forum) ECE 526
Communication Cost Example • Consider a second generation network system that forwards IP datagram. If the system has 16 interfaces that each connect to an OC-192 line (data rate is 10 Gbps). These 16 interfaces are interconnected with a shared communication channel. The packet size is in the range of 40 bytes to 1500 bytes. What aggregate bandwidth is needed on the communication channel for the two design scenarios: • Every bit of a packet transfers through the shared communication channels. • Only a 4-byte packet memory address transfers through the shared communication channels. ECE 526
NP Operating Support • Programming model: interrupt, event vs. thread based • Parallel and concurrent execution support • Dispatch mechanism: how threads are initiated ECE 526
Summary • NP scaling by • Heterogeneous multiprocessors structured hierarchically • Mixed memory technologies explicitly available to programmer • Different communication mechanisms • Operating support important to achieve high system performance • NP scaling limited by • Physical space: chip area (less than 400 mm2) • Pin limits and packaging technology • Power consumption and heat dissipation ECE 526
For Next Class and Reminder • Read Comer: chapter 15 and 16 • Homework solution on-line by Friday • Midterm: 10/6 • Project • topic finalized 10/5 (group leader email me) • proposal presentation 10/22 ECE 526