1 / 21

ECE 526 – Network Processing Systems Design

ECE 526 – Network Processing Systems Design. Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer. NP Architectures. Last class: Key requirement of network processor: flexibility and scalability Optimized instruction set and parallel processing using multiprocessors

neila
Download Presentation

ECE 526 – Network Processing Systems Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer

  2. NP Architectures • Last class: • Key requirement of network processor: flexibility and scalability • Optimized instruction set and parallel processing using multiprocessors • This class: • Internal organization of NP: • Computation, storage and communication • Operating support • Content addressable memory (CAM) • NP scaling issues ECE 526

  3. NP Architectures • NP architecture characteristics • Computation • Processor hierarchy • Special-purpose functional units • Storage • Memory hierarchy • Content addressable memory (CAM) • Communication • Internal buses • External interfaces • Operation support • Concurrent/parallel execution support • Programming models • Dispatch mechanisms ECE 526

  4. Processor Functionality ECE 526

  5. Processor Pyramid ECE 526

  6. Packet Flow through Hierarchy • Accommodating tasks of different complexity and frequency • Low level: simple and frequent processing • High level: occasional and complex processing • Computation scaling • Faster processor • More concurrent threads • More processors • More processor types ECE 526

  7. Memory Hierarchy • Different memory technologies used for performance, cost and area • Conventional Approach: • Register + cache + off-chip DRAM • Exploiting locality: temporal and spatial • Optimized for average case • Transparent to programmer • Network Processors: • Register, scratch pad, control store, onboard RAM, CAM/TCAM, SRAM and SDRAM • Specialized for network processing application • Little temporal locality • Explicit to application developer • Different to programming • More control • Memory hierarchy is not “cached” but used explicitly ECE 526

  8. Memory Technology • Characterized by access latency, area • SRAM: 2-10 ns, 4-6 transistors • DRAM: 50-70 ns, 1 or 3 transistors • What data should be store where? • Instruction data • Packets data: header, payload and meta-data • Temporal data: data structure allocated on the stack • Application data: persistent data, e.g., routing table, rule file ECE 526

  9. Memory Size Example Consider a network system that processes IP datagram. Assume the system executes 5,000 instructions per packet, each instruction occupies 4 bytes, 10% of instructions need to access 4-byte value memory, each datagram consists of 1500 bytes, a lookup examines 10 4-byte values on average in an IP routing table, and a datagram arrives and leaves in an Ethernet frame. Compute the total number of memory locations accessed to process on datagram. Assume no memory caching. • Instruction Memory: • Packet Memory: • Application Memory: • Temporary Memory: Total: ECE 526

  10. Memory Scaling • Memory access time: raw access speed • Technology dependent • Important for random access • Memory bandwidth • Important for overall system performance • Scale with • Multiple ports • Multiple banks • Wider bus • Limits by • Pins and package cost ECE 526

  11. Content Addressable Memory • Not using address to locate content • CAM using content as input in a query-style format • Organized as array of slots • Combination of mechanisms • Random access storage • Exact-match pattern search • Rapid search enabled with parallel hardware ECE 526

  12. Lookup using Conventional CAM • Given • Pattern for which to search • Known as key • CAM returns • First slot that match key or • All slots that match key • Algorithm for each slot do { if (key == slot) { declare key matches slot; } else { declare key does not match slot; } } ECE 526

  13. Ternary CAM (TCAM) • Regular CAM • Binary value: 0 and 1 • Requiring key to match all the content in one slot • Not flexible • TCAM • Ternary value: 0, 1 and don’t care • Implemented using masking of entries • Good for network processor flow classification ECE 526

  14. TCAM Lookup • Each slot has bit mask • Hardware uses mask to decide which bits to test • Algorithm for each slot do { if (key & mask ) == (slot & mask)) { declare key matches slot; } else { declare key does not match slot; } } ECE 526

  15. Partial Matching using TCAM • Key matched slot 1 • Packet belonging to flow ID: 00.02 • Here “additional information” stored in each slot ECE 526

  16. Classification using TCAM • Flexibility: “additional information” stored in separate memory • Extracting values from fields in headers • Forming values in contiguous string • Using a key for TCAM lookup • Storing classification in slot ECE 526

  17. Communication • Internal interfaces: channels between processing elements, memories • Internal bus • Hardware FIFO: sequential access • Transfer register: random access • Onboard shared memory: shared random access • External interfaces • Memory interfaces: accesses to larger off-chip memory • Direct I/O interfaces: e.g., access to link interfaces • Bus interfaces: accesses to other devices, e.g., control CPU • Switching fabric interface • Access to switching fabric • Several standards (e.g., CSIX by NP Forum) ECE 526

  18. Communication Cost Example • Consider a second generation network system that forwards IP datagram. If the system has 16 interfaces that each connect to an OC-192 line (data rate is 10 Gbps). These 16 interfaces are interconnected with a shared communication channel. The packet size is in the range of 40 bytes to 1500 bytes. What aggregate bandwidth is needed on the communication channel for the two design scenarios: • Every bit of a packet transfers through the shared communication channels. • Only a 4-byte packet memory address transfers through the shared communication channels. ECE 526

  19. NP Operating Support • Programming model: interrupt, event vs. thread based • Parallel and concurrent execution support • Dispatch mechanism: how threads are initiated ECE 526

  20. Summary • NP scaling by • Heterogeneous multiprocessors structured hierarchically • Mixed memory technologies explicitly available to programmer • Different communication mechanisms • Operating support important to achieve high system performance • NP scaling limited by • Physical space: chip area (less than 400 mm2) • Pin limits and packaging technology • Power consumption and heat dissipation ECE 526

  21. For Next Class and Reminder • Read Comer: chapter 15 and 16 • Homework solution on-line by Friday • Midterm: 10/6 • Project • topic finalized 10/5 (group leader email me) • proposal presentation 10/22 ECE 526

More Related