150 likes | 272 Views
ECE 526 – Network Processing Systems Design. Hardware Architecture for Protocol Processing Chapter 8: D. E. Comer. Goal. Understand hardware architecture of protocol processing Learn the key metric of protocol processing system: aggregated packet rate
E N D
ECE 526 – Network Processing Systems Design Hardware Architecture for Protocol Processing Chapter 8: D. E. Comer
Goal • Understand hardware architecture of protocol processing • Learn the key metric of protocol processing system: • aggregated packet rate • Learn the key requirements of protocol processing system • High throughput • Scalability • Survey mechanisms to design scalable protocol processing systems ECE 526
Outline • First generation of network processing system architecture • Figure of metric of network processing system • Possible ways to improve performance of network processing systems ECE 526
1st Generation Network System • Traditional software-based router • Using conventional hardware • Single general-purpose processor handles most tasks • Single shared memory • I/O over a shared bus • NIC use same design as other I/O devices Cheap but performance is poor! ECE 526
Figure of Metric of Network Systems • Interface data rate • Rate at which data enters /leaves • Aggregate data rate • Sum of interface rates • Measure of total data rate system can handle • Note: aggregate rate crucial if CPU handles traffic from all interfaces • Could be misleading if packet size varying and processing cost constant • Aggregate packet rate • Sum of the number of packets enters / leaves system • More important for protocol processing (no touch payload) • Why? • Packet rate vs. data rate • CPU metric: per-packet rate • Interface hardware metric: per-bit (data) rate ECE 526
Data Rate vs. Packet Rate • Packet size: small 64 byte; large 1518 byte • For protocol processing, with same data rate, which is more difficult for network processing system? • Smallest packet or • Biggest packet • How to calculate the packet rate? ECE 526
Aggregate Packet Rate ECE 526
Time per Packet • Aggregate packet rate determines time per packet • Each packet processing requires in the order of 100s to 1000s instruction per packet ECE 526
Feasibility Analysis • Design a software router • data rate 10Gbps • Assuming small packets (64B) • Assuming each packet need 10,000 instruction to process • Can Intel 80986@2009 do the job? • CPU:24Ghz • 1 billion transistors • Address bus bit: 64 • CPU is a RISC machine which can execute an instruction per clock cycle • Hint: • What is the packet rate? • What is the processing requirement in MIPS? • Single CPU router lacks scalability. How multi-core? ECE 526
Scalability • The capability of a system that can be easily extended in “size” and performance • E.g., CPU with more memory slots and disk slots; router can add more ports or faster links • Why we care scalability? • Design a new system is timing consuming and expensive • Performance requirement increase fast • Others • How can we make a network system more scalable? • Optimized processing engines • Intelligent NICs • Parallel processing by duplicating processing engines + NICs ECE 526
Processing Power • Overcoming processing bottlenecks: • Specialized hardware (ASICs) • Fine-grained parallelism • Symmetric coarse-grain parallelism • Asymmetric coarse-grain parallelism • Special-purpose coprocessors • Other improvements • NICs with onboard processing • Smart NICs • => basically same as per-port processing engines ECE 526
Parallelism in Processors • Fine-grained parallelism • Exploits instruction-level parallelism • Examples: VLIW, SMT, etc. • Limited due to workload • Symmetric coarse-grain parallelism • Multiple parallel identical CPUs • Inter-processor communication can limit performance • Asymmetric coarse-grain parallelism • Multiple parallel different CPUs • E.g., one processor for layer 2, one for layer 3 • Special-purpose coprocessors • Custom logic for lookups, checksums, etc. • High-performance but not (fully) programmable • Key question: how such a system can be programmed • Duplicate processing engines --- Advance router architecture ECE 526
Advanced Router Architecture • Port: point of attachment for a physical link • Switching Fabric (SF): interconnect input & output ports • Line Card: the device connecting between link and SF • Routing Processor: create forwarding tables using routing protocol • Queues: buffers between input port and SF or SF and output port S.Keshav etc. IEEE Communication 1998 ECE 526
Advanced Router Architecture • Changing requirements • Increasing link speed • Increasing number of ports • Increasing routing tables • Increasing processing complexity • Scalable system design: • Exploit parallelism wherever possible • Per-port, per-flow, per-packet, instruction-level • One Processing engine per port (instead of single CPU) • Multiple processors per port • “Better” processors ECE 526
Reminder • Read Comer: chapter 11 & 12 • Sep. 19: project group leader email me your group members for the project • Sep. 24: homework 1 due ECE 526