280 likes | 294 Views
Packet Classification for Core Routers: Is there an alternative to CAMs?. Paper by: Florin Baboescu, Sumeet Singh, George Varghese Presentation by: Edward W. Spitznagel. Outline. Introduction Packet Classification Problem Extended Grid-of-Tries (EGT) Grid-of-Tries
E N D
Packet Classification for Core Routers: Is there an alternative to CAMs? Paper by: Florin Baboescu, Sumeet Singh, George Varghese Presentation by: Edward W. Spitznagel
Outline • Introduction • Packet Classification Problem • Extended Grid-of-Tries (EGT) • Grid-of-Tries • Extending Grid-of-Tries into EGT • Path Compression • Results • Summary
Packet Classification Problem Filter Source Address Destination Address Source Port Destination Port Protocol Action Cost a 11* 01* 2-4 0-15 TCP 2 fwd 7 b 01* 0010 3-15 3-15 UDP fwd 2 10 c 0101 * 3 * * deny 5 d 1101 101* * * ICMP fwd 5 7 • Suppose you are a firewall, or QoS router, or network monitor ... • You are given a list of rules (filters) to determine how to process incoming packets, based on the packet header fields • Goal: when a packet arrives, find the least-cost rule that matches the packet’s header fields
Packet Classification Problem Filter Source Address Destination Address Source Port Destination Port Protocol Action Cost a 11* 01* 2-4 0-15 TCP 2 fwd 7 b 01* 0010 3-15 3-15 UDP fwd 2 10 c 0101 * 3 * * deny 5 d 1101 101* * * ICMP fwd 5 7 • Example: packet arrives with header (0101, 0010, 3, 5, UDP) • classification result: filter c • filter b also matches, but, c has lower cost • Easy when we have only a few rules; very hard with 100,000 rules and packets arriving at 40 Gb/s
Packet Classification - Metrics • Metrics for evaluating classification algorithms: • Time complexity of classifying a packet • often expressed as the number of memory accesses required • Storage requirements of data structures • Number of fields that can be handled
Packet Classification in Core Routers • Many core routers have “fairly large” (e.g. 2000 rule) databases • Expected to grow; in fact, may be limited by current technology • Classification in core routers must be done quickly • Emerging core routers operate at 40Gb/s. With 40-byte packets, that means one packet every 8 nsec • Thus the general belief that brute-force hardware (TCAMs) will be necessary to support packet classification in core routers
Packet Classification - TCAM disadvantages • Ternary CAMs (TCAM) have disadvantages • Density Scaling: 10-12 transistors per bit of TCAM (vs. 4-6 transistors per bit of SRAM) • Power Scaling: due to performing all comparisons in parallel. • Time Scaling: 5-10 nsec for a TCAM operation • Extra Chips: requires TCAM chip(s) and bridge ASIC • Rule Multiplication for ranges: arbitrary ranges are represented by sets of prefixes; very inefficient. • Thus, we consider an algorithmic solution...
Packet Classification trends • Packet classification in 2D: several good methods • Grid of Tries, Area-based QuadTrees, FIS-trees, Tuple-space search, range trees and fractional cascading • Classification in k dimensions, where k>2, is hard • O(logK-1N) time and linear space, or O(log N) time and O(NK) space, for N filters in K dimensions • Modern algorithms: use heuristics to exploit the structure and properties that real-world filter databases tend to have. • Example: RFC and HiCuts algorithms
Extended Grid of Tries (EGT) 0xFFFF b c d Dest.Address a 0 0 0xFFFF Source Address • Observation: Core router tables studied have a low maximum filter depth in the 2D space defined by <Source IP Address, Destination IP Address> • in this case, “low” means20 or less • i.e. no point in this 2D plotof filters is covered by morethan 20 filters
Extended Grid of Tries (EGT) • The Basic Idea: • Use an existing 2D scheme to classify with respect to Source IP and Dest. IP • Then, do linear search over asmall list of possible matches(at most 20, but typicallyaround 5) • EGT: use Grid-of-Triesas the 2D scheme
Grid of Tries - Intuition • Imagine a search trie containing Dest. Address prefixes • Now add a Source Address trie under each Dest. prefix • Filters are stored in these tries, perhaps multiple times
Grid of Tries - Intuition • Reduce storage by storing each filter only once • But we now need to backtrack to ancestors’ source tries during a search...
Grid of Tries • Use switch pointers to improve search efficiency • allows us to jump to the next source trie among ancestors, instead of backtracking
Extended Grid of Tries • EGT uses jump pointers instead of switch pointers • EGT requires the 2D search to return all filters matching in those dimensions • Thus, some of the nodes skipped by a switch pointer cannot be skipped in an EGT search • So, search complexity is a bit higher than in ordinary Grid-of-Tries • worst case search takes W+(H+1)*W = (H+2)*W time, where W=time to find best prefix in a single trie, and H=max trie height (H=32 for IPv4) • but, the authors expect typically it takes L*W with L being a small value (reflecting the low maximum prefix containment seen in most filter databases)
EGT with Path Compression (EGT-PC) • EGT-PC adds Path Compression whereby single branching paths are removed • Improves search time and storage requirements, particularly for small filter sets
EGT-PC: Results • Storage requirements: impressively low (almost as low as TCAM!) • since we store each filter only once • Storage, in terms of number of 32-bit words • Classification time is good, but not as impressive • also a result of storing each filter once: we therefore may need to traverse multiple Source tries • Memory accesses, in terms of 32-bit word accesses
EGT-PC: Results • Memory usage by component: • Storage for list is proportionalto number of filters • Storage for trie is roughlyproportional to number of filters • Path compression reduces storage by a factor of 3, roughly
EGT-PC: Results with larger databases • Larger databases are generated using smaller ones as a core • randomly generated prefixes for Source Address and Destination Address, using the prefix length distributions from the original databases • Other fields are randomly derived from the distributions in the original databases • Memory Accesses: still not bad, even for large databases • Storage Requirements: still appear to be linear
EGT-PC: Remarks • May only work well with core routers • Lookups: • faster than HiCuts; not as fast or as deterministic as RFC. • can easily be characterized by maximum 2D filter depth • Storage requirements: quite good • using Grid-of-Tries for the 2D scheme is a wise choice (storage efficiency) • Very nice to have results comparing several different algorithms (unlike nearly all previous papers) • It is possible to apply the basic EGT idea, but with a different 2D scheme • Tuple Space, FIS-trees, RFC in 2D, and perhaps Area-based QuadTrees • The trick is that the 2D scheme must be modified to return all filters matching those 2 dimensions (rather than just the least-cost filter matching those 2 dimensions)
Comparison of different algorithms Best Worst Lookup Speed TCAM EGT-PC HiCuts-1 EGT Linear Search RFC HiCuts-4 Best Worst Storage Requirements Linear Search RFC HiCuts-1 EGT HiCuts-4 TCAM EGT-PC
Summary • Packet Classification: Given packet P and list of filters F, find least cost filter in F that matches P • Important metrics: Lookup time, data structure size • Extended Grid of Tries • Core routers have a low maximum filter depth in the 2D space defined by <Src. Addr, Dest. Addr> • Thus, we can perform a 2D search via Grid of Tries, and then • and we can add path compression to the trie • Lookup time is fairly good; storage requirements are very good.
Geometric Representation Source Port 6 c b a Filter 010 Source Address xx1 xxx 2-3 0-7 7 Source Port 4 Source Address 2 0 0 2 4 6 • Filters with K fields can be represented geometrically in K dimensions • Example: b c c c c a
Ternary CAMs • Most popular practical approach to high-performance packet classification • Hardware compares query word (packet header) to all stored words (filters) in parallel • each bit of a stored word can be 0, 1, or X (don’t care) • Very fast, but not without drawbacks: • High power consumption limits scalability • inefficient representation of ranges
Ternary CAM - Example Src. Addr. Dest. Addr. Packet: Query: 1110 0110 11100110 TCAM c b Filter a Source Address 11xx xxxx 0xxx Destination Address xxxx 0110 01xx Address Contents 0 11xxxxxx Match! 11100110 1 0xxx01xx Doesn’t Match 11100110 2 xxxx0110 Match! 11100110 (Now perform priority resolution...)
Range Matching in TCAMs Destination Port 6 Filter F Source Port 1-4 3-5 Destination Port 4 Source Port 2 0 0 2 4 6 • Convert ranges intosets of prefixes • 1-4 becomes 001, 01*, and 100 • 3-5 becomes 011 and 10* F
Range Matching in TCAMs Destination Port 6 Filter a b f d e c 01* 100 01* 001 100 Source Port 001 Destination Port 011 10* 10* 011 011 10* 4 Source Port 2 0 0 2 4 6 • With two 16-bit range fields,a single rule could require upto 900 TCAM entries! • Typical case: entire filter setexpands by a factor of 2 to 6 a b c d e f