1 / 40

Efficient Packet Classification for Internet Technology

This paper discusses packet classification algorithms that are efficient in terms of memory and time, and can handle a large number of rules at wire-speed. It explores the importance of packet classification in differentiated services and the challenges of real-time operations. The paper also presents algorithms that optimize lookup performance while sacrificing update performance.

ggalvan
Download Presentation

Efficient Packet Classification for Internet Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Packet Classification # 3 Ozgur Ozturk CSE 581: Internet Technology Winter 2002 Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  2. Introduction • Importance • Identify the context of packets  Apply necessary actions • Differentiated services • Memory and Time Efficiency • Must handle Ks of rules • Must be at wire-speed (No queuing) Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  3. Packet Classification # 3Paper List • T. Lakshman, D. Stiliadis, "High-Speed Policy-based Packet Forwarding Using Efficient Multi-dimensional Range Matching” [Bit-Parallelism] • http://www.bell-labs.com/user/stiliadi/filter/paper.html • F. Baboescu, G. Varghese, "Scalable Packet Classification” [ABV: Agregated Bit Vector] • M. Buddhikot, S. Suri, M. Waldvogel, "Space Decomposition Techniques for Fast Layer-4 Switching“ [Space Decomposition] • V. Srinivasan, G. Varghese, S. Suri, M. Waldvogel, "Fast and Scalable Layer Four Switching“ [Paper4] Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  4. Bit-Parallelism Paper-Intro. • Presents packet classification schemes • traffic-independent and worst-case performance metric • a few K rules, at rates of M packets per second using range matches on more than 4 packet header fields Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  5. Bit-Parallelism PaperRequirement for Real-Time Operation • Traditional router architectures • flow-cache architectures to classify packets • identified flows are expected to arrive in near future • Current backbone routers • active flows extremely high • OC-3 links, 256K flows • Cashes implemented as hash tables • scales well to that size Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  6. Bit-Parallelism PaperRequirement for Real-Time Operation 2 - Hash-Table Prob.s • Good hash function is non-trivial • 100 to 200 bits of header to be randomly distributed to no more than 20 to 24 bits of hash index • header value distribution is unknown • Performance of cache-based schemes is heavily traffic dependent • Malicious Users • limitations of hashing algo. & cashing techniques • Packet queuing delays acceptable after classification Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  7. Bit-Parallelism PaperPacket Classification Constraints • Scale to large routers with Gigabit links. • Process at wire-speed • 75% of packets < typical TCP packet size (552 bytes) • Nearly half are 40 to 44 bytes (TCP Ack) • Rules on several fields, specifying ranges, exact matches and prefixes • Two prefix fields in some cases • Allow arbitrary priorities for policies to allow distinction for multiple matches • Optimize for lookups, sacrifice update performance • lookup rate/update rate 107. Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  8. Bit-Parallelism PaperPacket Classification Constraints-2 • Memory access time; dominant factor in worst-case lookup execution time • Amenable to hardware implementation • Time vs. Space Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  9. Bit-Parallelism PaperGeneral Packet Classification • Decomposable search to perform multi-dimensional search for packet filtering • k-dimensional query  a set of 1-dimensional queries on 1-dimensional intervals • Exploit parallelism where possible • Seek poly-logarithmic solution • Packet header fields  k-dimensions • Filters  overlapping regions in the k-dimensional space Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  10. Bit-Parallelism PaperEfficiency of Proposed Algorithms • 1st Algorithm • Memory: k*n2O(n) bits per dimension • Time: log(2n)+1 • Memory access: n/w • 2nd Algorithm • Memory reduce to O(n log n) bits • Time increase constant • Can be optimized for time and memory budget • Exploit on-chip memory in traffic-independent manner, to speed up worst case. Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  11. Notation • Rule rm in k dimentions • rm = (e1,m, e2,m,…. ek,m) • e range Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  12. Bit-Parallelism PaperAlgorithm demo on 2-D/Preprocessing 1 Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  13. Bit-Parallelism PaperAlgorithm demo on 2-D/Preprocessing 2 Max 2n+1 intervals for n rules Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  14. Bit-Parallelism PaperAlgorithm demo on 2-D/Preprocessing 3 Sets of rules formed corresponding to each region Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  15. Bit-Parallelism PaperAlgorithm demo on 2-D/Online 1 • P1 (x*,y*) to be classified • find intervals x* and y* belongs to • binary search  log(2n+1)+1 comparisons/dimension • Create Intersection of all sets • conjunction of corresponding bit vectors • Highest Priority entry in the resultant bit vector Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  16. Bit-Parallelism PaperAlgorithm demo on 2-D/Online 2 • Max Set Cardinality = O(n) • Intersection step examines all rules at least ones  Time complexity = O(n) • With bit-level parallelism • The bitmaps representing sets stored in a (2n+1)*n array Bj[i,1..n] (Ri,j set stored for each dimension) • k*n/w memory accesses • Different processing elements for each dimension in hardware implementation • Prototype Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  17. Different processing elements for each dimension in hardware implementation Prototype Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  18. Bit-Parallelism Paper- Algorithm 2Packet Class. based on Inc. Reads • Algorithm utilizes incremental reads to reduce required memory • Allows time-space optimization and increases localization for off-chip SDRAM and wide on-chip memory implementations • Consider a specific dimension j • Assume maximum 2n+1 non-overlapping intervals • Corresponding to intervals in an n-bit bitmap with the positions of the 1s indicating the filter rules that overlap this interval • Adjacent intervals’ corresponding bitmaps differ in only one bit • A single bitmap and 2n pointers of size log n to the differing bits can be used to reconstruct any bitmap Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  19. Bit-Parallelism Paper- Algorithm 2Packet Class. based on Inc. Reads 2 • Reduces space requirement to O(n log n) from O(n2) • Further Generalize • (2n+1)/l bitmaps instead of 1 • (2n+1)/2l pointers needed • Choose l by need • 2n+1  memory reduce to O(n log n) • Memory access increase n/w2n log n /w • Trade off decision according to on-chip/off-chip memory ratio. Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  20. Bit-Parallelism Paper- Algorithm 2Special Case: 2-D Classification • Necessary for best-effort traffic aggregation in Internet backbone • Determine next hop and resource allocations based on destination and source addresses only • Longest prefix match lookups • Restrict source prefix ranges to powers of 2 in order to reduce space • space requirement O(n) with trie implementation • Virtual intervals • Map intervals of prefix lengths to both dimensions, sorted by length • “Virtual Intervals” allow worst-case lookup time of O(ls+log n) where ls is the number of possible prefix lengths • Multicast group identification requires only two additional memory accesses Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  21. Bit-Parallelism Paper- Algorithm 2Conclusions • Packet classification, or filtering, is a useful primitive in connectionless networks to provide differentiated service and policy-based routing • More recently, security and active processing • Two multi-dimensional range matching algorithms allow millions of packets per second to be processed on a set of thousands of filter rules • Robust and predictable worst-case performance • Efficient 2-D algorithm for backbone routers with hundreds of thousands of routing entries • Algorithms demonstrate that there may be no need to restrict filtering to edge routers Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  22. Paper4 Layer Four Switching • Traditional router performs looking-up based on destination address • Layer four switching provides increased flexibility: it gives a router the capability to distinguish and deal with traffics differently: • Block traffic from dangerous site • Provide QoS service for certain traffics • Give preferential treatment to certain traffic (say, database flow). • Difficulties: need layer four header information, which may not always available • any modification of layer four header may cause problems • Do not how to get header info when encrypted • Some variants of L4S: • Firewall • Reservation protocols such as RSVP • Routing based on traffic type, say web traffic

  23. Paper4The Best Matching Filter Problem • A packet P has k distinct header fields for lookup: H[1], … , H[k] • The filter database of a Layer 4 Router consists of a finite set of filters: F1, F2, …, FN, each filter Fi has an associated directive acti • Match: each field of P matches the corresponding field of F • Cost: used to determine an unambiguous match (say order of filters) • An address range can always be transferred into a sequence of prefixes so we can use prefix match A filter database Dest Src DP SP SP M M M M T1 * Net * * * S * T0 Net * * 25 53 53 23 123 * * * * * * * 123 * * * * UDP * * UDP * TCP-ACK * A packet example: (M, S, UDP, 53, 125)

  24. Paper4Set Pruning Trees (1) • Build a trie on the destination prefixes in the database • Each valid prefix in the destination trie points to a trie containing some source prefixes. • A single filter may be fit into multiple destination prefixes, thus has multiple source trie copies. • Memory space: O(N2) • Time complexity: O(N)

  25. Set Pruning Trees (2) 0 1 Dest-Trie 0 0 Src-Trie 0 1 0 1 0 0 1 F3 F4 F3 E.g.: Looking for: (001, 001) 0 1 0 1 0 1 0 1 0 F6 0 F7 F2 F1 F5 F7 F2 F1 F7 F7

  26. Avoid the Memory Blowup (1) • Avoid the copying by having each destination prefix D point to a source trie that stores the filters whose destination field is exactly D • When searching, may need go back to the destination trie for multiple times • Time complexity: O(W2) • Space complexity: O(NW)

  27. Avoid the Memory Blowup (2) 0 1 Dest-Trie 0 0 1 0 1 0 1 E.g.: Looking for: (001, 001) F3 F4 1 1 0 F6 0 Src-Trie F5 F2 F1 F7 Memory requirement=O(NW) Lookup Worst Case= O(W2)

  28. Improving Search Time: Basic Grid-of-Tries (1) • Basic idea: • Use pre-computation and switch pointers (in the lower lever tries) to speed up search in a later source trie base on the search in an earlier source trie. (Remember the previous searching result) • Role of switch pointer • Allow us to increase the length of the matching source prefix, without having to restart at the root of the next ancestor source trie. • Stored Filter: node (D,S) stores the least cost filter whose dest field is a prefix of D and src field is a prefix of S • Time complexity: 2W • Space complexity: O(NW)

  29. Improving Search Time: Basic Grid-of-Tries (2) 0 1 Dest-Trie 0 0 0 1 0 0 1 0 1 E.g.: Looking for: (001, 001) x F3 F4 0 0 1 1 0 F6 0 Src-Trie y F5 F2 F1 F7

  30. Further Improvement & Extension • Use some faster scheme for destination address matching • Time complexity O(W)  O(log W) • Use multi-bit tries for source address matching • Time complexity O(W)  O(W/k) • Extend Grid-of-tries to handle protocol and port fields • 3 GOT copies for TCP, UDP and OTHER respectively, • 4 hash tables for 4 port combinations: • both unspecified, destination only, source only, both specified

  31. Cross-Producting (1) • How-to • Slice filter database into column, the i-th column storing all distinct prefixes in field i. • Make a cross-product table of all k columns • Pre-compute the least cost filter that matches each cross-product entry • When packet comes in, do best prefix matching for each field respectively • With matching results, find out the corresponding entry in the cross-product table • Discussion • Very fast (for matching) • Problem: memory explosion: N^k • Solution: On Demand Cross-Producting

  32. Cross-Producting (2) Dest Src DP SP SP Dest Prefix Src Prefix DestPort Prefix SrcPort Prefix Flags Prefixes M M M M T1 * Net * * * S * T0 Net * * 25 53 53 23 123 * * * * * * * 123 * * * * UDP * * UDP * TCP-ACK * 123 Default M T1 Net Default S T0 Net Default 25 53 23 123 Default UDP TCP-ACK Default Num CrossProduct Matching Filter F1 F1 F1 F1 F1 F1 … F8 F8 1 2 3 4 5 6 … 479 480 M, S, 25, 123, UDP M, S, 25, 123, TCP-ACK M, S, 25, 123, default M, S, 25, default, UDP M, S, 25, default, TCP-ACK M, S, 25, default, default … … default,default,default,default,TCP-ACK default,default,default,default,default E.g. Looking for: (M,S,UDP,25,57)

  33. Conclusions • GOT solution scalable (linear) storage & fast lookups for D-S filters. • More general filters  high lookup cost • Cross-Producting solution, higher variance, but faster on average (for lookup) because of cashing need. • Hybrid scheme combines flexibility with efficiency. Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  34. ABV: "Scalable Packet Classification” F. Baboescu, G. Varghese, • GOAL • Packet classification • scalable (in rules, upto 100,000) • wire speed • Past Work • Linear time search • Linear amount of TCAMS • Lucent scheme • worst case doesn't scale Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  35. SOLUTION • Aggregated Bit Vector • improvement on Lucent bit vector • rule aggregation • rule rearrangement • Rule Aggregation • bit vectors are sparse • i.e., few rules match • Some compression scheme Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  36. SOLUTION continued • Rule Rearrangement • overlap is rare • place rules w/ common values together • sort out rule ordering later Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  37. Comparing ABV w/ BV of Lucent Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  38. Results • At least an order magnitude faster than BV • Scales well for memory access Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  39. Paper # 3“Space Decomposition Techniques for Fast Layer-4 Switching" M. Buddhikot, S. Suri, M. Waldvogel • new scheme, based on space decomposition, whose search time is comparable to the best existing schemes, but which also offers fast worst-case filter update time. • three key ideas • innovative data-structure based on quadtrees for a hierarchical representation of the recursively decomposed search space • fractional cascading and precomputation to improve packet classification time • prefix partitioning to improve update time Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

  40. Space Decomposition Evaluation • Depending on the actual requirements of the system this algorithm is deployed in, a single parameter can be used to tradeoff search time for update time. • Amenable to fast software and hardware implementation. • For Ntwo-dimensional filters specified using prefixes of up to W bits in length, Area-based Quadtrees (AQT) data structure requires O(N)space, O(W) search time, and O((N)1/) • Both the average and worst-case search times and memory consumption are comparable or better than other schemes known in the literature. Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02

More Related