Packet Classification for Core Routers: Is there an alternative to CAMs?

Packet Classification for Core Routers: Is there an alternative to CAMs? Paper by: Florin Baboescu, Sumeet Singh, George Varghese Presentation by: Edward W. Spitznagel

Outline • Introduction • Packet Classification Problem • Extended Grid-of-Tries (EGT) • Grid-of-Tries • Extending Grid-of-Tries into EGT • Path Compression • Results • Summary

Packet Classification Problem Filter Source Address Destination Address Source Port Destination Port Protocol Action Cost a 11* 01* 2-4 0-15 TCP 2 fwd 7 b 01* 0010 3-15 3-15 UDP fwd 2 10 c 0101 * 3 * * deny 5 d 1101 101* * * ICMP fwd 5 7 • Suppose you are a firewall, or QoS router, or network monitor ... • You are given a list of rules (filters) to determine how to process incoming packets, based on the packet header fields • Goal: when a packet arrives, find the least-cost rule that matches the packet’s header fields

Packet Classification Problem Filter Source Address Destination Address Source Port Destination Port Protocol Action Cost a 11* 01* 2-4 0-15 TCP 2 fwd 7 b 01* 0010 3-15 3-15 UDP fwd 2 10 c 0101 * 3 * * deny 5 d 1101 101* * * ICMP fwd 5 7 • Example: packet arrives with header (0101, 0010, 3, 5, UDP) • classification result: filter c • filter b also matches, but, c has lower cost • Easy when we have only a few rules; very hard with 100,000 rules and packets arriving at 40 Gb/s

Packet Classification - Metrics • Metrics for evaluating classification algorithms: • Time complexity of classifying a packet • often expressed as the number of memory accesses required • Storage requirements of data structures • Number of fields that can be handled

Packet Classification in Core Routers • Many core routers have “fairly large” (e.g. 2000 rule) databases • Expected to grow; in fact, may be limited by current technology • Classification in core routers must be done quickly • Emerging core routers operate at 40Gb/s. With 40-byte packets, that means one packet every 8 nsec • Thus the general belief that brute-force hardware (TCAMs) will be necessary to support packet classification in core routers

Packet Classification - TCAM disadvantages • Ternary CAMs (TCAM) have disadvantages • Density Scaling: 10-12 transistors per bit of TCAM (vs. 4-6 transistors per bit of SRAM) • Power Scaling: due to performing all comparisons in parallel. • Time Scaling: 5-10 nsec for a TCAM operation • Extra Chips: requires TCAM chip(s) and bridge ASIC • Rule Multiplication for ranges: arbitrary ranges are represented by sets of prefixes; very inefficient. • Thus, we consider an algorithmic solution...

Packet Classification trends • Packet classification in 2D: several good methods • Grid of Tries, Area-based QuadTrees, FIS-trees, Tuple-space search, range trees and fractional cascading • Classification in k dimensions, where k>2, is hard • O(logK-1N) time and linear space, or O(log N) time and O(NK) space, for N filters in K dimensions • Modern algorithms: use heuristics to exploit the structure and properties that real-world filter databases tend to have. • Example: RFC and HiCuts algorithms

Extended Grid of Tries (EGT) 0xFFFF b c d Dest.Address a 0 0 0xFFFF Source Address • Observation: Core router tables studied have a low maximum filter depth in the 2D space defined by <Source IP Address, Destination IP Address> • in this case, “low” means20 or less • i.e. no point in this 2D plotof filters is covered by morethan 20 filters

Extended Grid of Tries (EGT) • The Basic Idea: • Use an existing 2D scheme to classify with respect to Source IP and Dest. IP • Then, do linear search over asmall list of possible matches(at most 20, but typicallyaround 5) • EGT: use Grid-of-Triesas the 2D scheme

Grid of Tries - Intuition • Imagine a search trie containing Dest. Address prefixes • Now add a Source Address trie under each Dest. prefix • Filters are stored in these tries, perhaps multiple times

Grid of Tries - Intuition • Reduce storage by storing each filter only once • But we now need to backtrack to ancestors’ source tries during a search...

Grid of Tries • Use switch pointers to improve search efficiency • allows us to jump to the next source trie among ancestors, instead of backtracking

Extended Grid of Tries • EGT uses jump pointers instead of switch pointers • EGT requires the 2D search to return all filters matching in those dimensions • Thus, some of the nodes skipped by a switch pointer cannot be skipped in an EGT search • So, search complexity is a bit higher than in ordinary Grid-of-Tries • worst case search takes W+(H+1)*W = (H+2)*W time, where W=time to find best prefix in a single trie, and H=max trie height (H=32 for IPv4) • but, the authors expect typically it takes L*W with L being a small value (reflecting the low maximum prefix containment seen in most filter databases)

EGT with Path Compression (EGT-PC) • EGT-PC adds Path Compression whereby single branching paths are removed • Improves search time and storage requirements, particularly for small filter sets

EGT-PC: Results • Storage requirements: impressively low (almost as low as TCAM!) • since we store each filter only once • Storage, in terms of number of 32-bit words • Classification time is good, but not as impressive • also a result of storing each filter once: we therefore may need to traverse multiple Source tries • Memory accesses, in terms of 32-bit word accesses

EGT-PC: Results • Memory usage by component: • Storage for list is proportionalto number of filters • Storage for trie is roughlyproportional to number of filters • Path compression reduces storage by a factor of 3, roughly

EGT-PC: Results with larger databases • Larger databases are generated using smaller ones as a core • randomly generated prefixes for Source Address and Destination Address, using the prefix length distributions from the original databases • Other fields are randomly derived from the distributions in the original databases • Memory Accesses: still not bad, even for large databases • Storage Requirements: still appear to be linear

EGT-PC: Remarks • May only work well with core routers • Lookups: • faster than HiCuts; not as fast or as deterministic as RFC. • can easily be characterized by maximum 2D filter depth • Storage requirements: quite good • using Grid-of-Tries for the 2D scheme is a wise choice (storage efficiency) • Very nice to have results comparing several different algorithms (unlike nearly all previous papers) • It is possible to apply the basic EGT idea, but with a different 2D scheme • Tuple Space, FIS-trees, RFC in 2D, and perhaps Area-based QuadTrees • The trick is that the 2D scheme must be modified to return all filters matching those 2 dimensions (rather than just the least-cost filter matching those 2 dimensions)

Comparison of different algorithms Best Worst Lookup Speed TCAM EGT-PC HiCuts-1 EGT Linear Search RFC HiCuts-4 Best Worst Storage Requirements Linear Search RFC HiCuts-1 EGT HiCuts-4 TCAM EGT-PC

Summary • Packet Classification: Given packet P and list of filters F, find least cost filter in F that matches P • Important metrics: Lookup time, data structure size • Extended Grid of Tries • Core routers have a low maximum filter depth in the 2D space defined by <Src. Addr, Dest. Addr> • Thus, we can perform a 2D search via Grid of Tries, and then • and we can add path compression to the trie • Lookup time is fairly good; storage requirements are very good.

Thanks -- Questions? ?

Backup slides to follow...

Geometric Representation Source Port 6 c b a Filter 010 Source Address xx1 xxx 2-3 0-7 7 Source Port 4 Source Address 2 0 0 2 4 6 • Filters with K fields can be represented geometrically in K dimensions • Example: b c c c c a

Ternary CAMs • Most popular practical approach to high-performance packet classification • Hardware compares query word (packet header) to all stored words (filters) in parallel • each bit of a stored word can be 0, 1, or X (don’t care) • Very fast, but not without drawbacks: • High power consumption limits scalability • inefficient representation of ranges

Ternary CAM - Example Src. Addr. Dest. Addr. Packet: Query: 1110 0110 11100110 TCAM c b Filter a Source Address 11xx xxxx 0xxx Destination Address xxxx 0110 01xx Address Contents 0 11xxxxxx Match! 11100110 1 0xxx01xx Doesn’t Match 11100110 2 xxxx0110 Match! 11100110 (Now perform priority resolution...)

Range Matching in TCAMs Destination Port 6 Filter F Source Port 1-4 3-5 Destination Port 4 Source Port 2 0 0 2 4 6 • Convert ranges intosets of prefixes • 1-4 becomes 001, 01*, and 100 • 3-5 becomes 011 and 10* F

Range Matching in TCAMs Destination Port 6 Filter a b f d e c 01* 100 01* 001 100 Source Port 001 Destination Port 011 10* 10* 011 011 10* 4 Source Port 2 0 0 2 4 6 • With two 16-bit range fields,a single rule could require upto 900 TCAM entries! • Typical case: entire filter setexpands by a factor of 2 to 6 a b c d e f

Packet Classification for Core Routers: Is there an alternative to CAMs?

Packet Classification for Core Routers: Is there an alternative to CAMs?

Presentation Transcript

Designing Packet Buffers for Internet Routers

Packet Classification

An Alternative Framework for Task-based Instruction: Core

Is there an App for that?

Packet Classification

Is there an app for that???

Is There An Association?

There really is an APP for that !!

There is an Enemy

Is there an App for that?

There really is an APP for that !!

Is there an app to help?

Approximate Caches for Packet Classification

The Housing Crisis: There is an Alternative Birkbeck College

Packet Classification # 3

BPC: A language for packet classification

There is an app for that

Fast Incremental Updates on Ternary-CAMs for Routing Lookups and Packet Classification

Approximate Caches for Packet Classification

Parallel tree search: An algorithmic approach for multi-field packet classification

On Finding an Optimal TCAM Encoding Scheme for Packet Classification

There is an alternative…