200 likes | 208 Views
This chapter outlines implementation principles in network processing system design, including examples such as TCAM updating and cautionary questions to consider. It covers principles related to avoiding waste, shifting computation, leveraging off-system components, and more.
E N D
ECE 526 – Network Processing Systems Design System Implementation Principles II Varghese Chapter 3
Outline • Review Principle 1-7 • Implementation principles • Reflect what we learned • Example: TCAM updating • Cautionary Questions
Reviews • P1: Avoid Obvious Waste • Example: copy packet pointer instead of packet • P2: Shift Computation in Time • precompute (table lookup), • evaluate lazily (network forensics) • Share Expenses (batch processing) • P3: Relax Subsystem Requirements • Trade certainty for time (random sampling); • Trade accuracy for time (hashing, bloom filter); • Shift computation in space (fast path/slow path)
Reviews • P4:Leverage Off-System Components • Examples: Onboard Address Recognition & Filtering, cache • P5: Add Hardware to Improve Performance • Use memory interleaving, pipelining (= parallelism); • Use Wide-word parallelism (save memory accesses) • Combine SRAM, DRAM (low-order bits each counter in SRAM for a large number of counters) • P6: Replace inefficient general routines with efficient specialized ones • Examples: NAT using forwarding and reversing tables • P7: Avoid Unnecessary Generality • Examples: RISC, microengine
P8: Don't be tied to reference implementations • Key Concept: • Implementations are sometimes given (e.g. by manufacturers) as a way to make the specification of an interface precise, or show how to use a device • These do not necessarily show the right way to think about the problem—they are chosen for conceptual clarity! • Examples: • Using parallel packet classification instead of sequential demultiplexing in TCP/IP protocols
P9: Pass hints across interfaces • Key Concept: if the caller knows something the callee will have to compute, pass it (or something that makes it easier to compute) as an argument! • "hint" = something that makes the recipient's life easier, but may not be correct • "tip" = hint that is guaranteed to be correct • Caveat: callee must either trust caller, or verify (probably should do both) • Example • Active message, the message carry the address of interrupt handler for fast dispatching
P10: Pass hints in protocol headers • Key Concept: If sender knows something receiver will have to compute, pass it in the header • Example: • Tag switching, packet contains extra information beside the destination address for fast lookup
P11: Optimize the Expected Case • Key Concept: If 80% of the cases can be handled similarly, optimize for those cases • P11a: Use Caches • A form of using state to improve performance • Example: • TCP input "header prediction" • If an incoming packet is in order and does what is expected, can process in small number of instructions
P12: Add or Exploit State to Gain Speed • Key Concept: Remember things to make it easier to compute them later • P12a: Compute incrementally • Here the idea is to "accumulate" as you go, rather than computing all-at-once at the end • Example: • Incremental computation of IP checksum
P13: Optimize Degrees of Freedom • Key Concept: be aware of variables under one’s control and evaluation criteria used determine good performance • Example: memory-based string matching algorithm • possible transitions from each state for a character is 256 (2^^8, ASCII coding using 8 bit); • Bit-split algorithm using 8 machines, each machine only check for one bit, the total possible transitions for a character is 16 (2^^1 * 8)
P14: Use special techniques for finite universes (e.g. small integers) • Key Concept: when the domain of a function is small, techniques like bucket sorting, bitmaps, etc. become feasible. • Example: • bucket sorting for NAT table lookup • NAT table is very sparse • Each bucket is accessed by hashing • Bucket sort • Partitioning an array into a finite number of bucket • Each bucket is sorted individually
P15: Use algorithmic techniques to create efficient data structures • Key Concept: once P1-P14 have been applied, think about how to build an ingenious data structure that exploits what you know • Examples • IP forwarding lookups • PATRICIA trees (data structure) were first • A special trie, with each edge of patricia tree labled with sequences of characters. • Then many other more-efficient approaches
TCAM • Ternary: 0, 1 and *(wildcard) • TCAM: specified length of key and associated actions • TCAM lookup: compare the query with all keys in parallel, output (in one cycle) the lowest memory location whose key matches the input • IP forward uses longest-prefix matching • DIP 010001 matches both 010001* and 01* • Using TCAM for IP forwarding, requires put all longer prefixes occur before any shorter ones.
IP Lookup • All prefixes with the same length are group together • the shortest prefix 0* are in the highest memory address • The packet with DIP: 110001 matches prefix of both P3 and P5 • P5 is chosen due to longest-prefix matches
Routing Table Update • 11* with P1 needed to insert to routing table • Naïve: create space in group of length-2 prefix, and pushing up one position all prefixes of length-2 and higher • Core routing table have 100, 000 entries 100, 000 memory accesses
Routing Table Update • P13: understand the exploit degrees of freedom -- we can add 11* at any position of group 2, not required after 10*. • We can add boundary of group 2 and group 3.
Clever Routing Table Updating • the maximum memory accesses is 32 – i.
Cautionary Questions • Q1: Is improvement really needed? • Q2: Is this really the bottleneck? • Q3: What impact will change have on rest of system? • Q4: Does BoE-analysis indicate significant improvement? • Q5: Is it worth adding custom hardware? • Q6: Can protocol change be avoided? • Q7: Do prototypes confirm the initial promise? • Q8: Will performance gains be lost if environment changes?
Summary • P1-P5: System-oriented Principles • These recognize/leverage the fact that a system is made up of components • Basic idea: move the problem to somebody else’s subsystem • P6-P10: Improve efficiency without destroying modularity • “Pushing the envelope” of module specifications • Basic engineering: system should satisfy spec but not do more • P11-P15: Local optimization techniques • Speeding up a key routine • Apply these after you have looked at the big picture