400 likes | 532 Views
Packet Classification # 3. Ozgur Ozturk CSE 581: Internet Technology Winter 2002. Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02. Introduction. Importance Identify the context of packets Apply necessary actions Differentiated services
E N D
Packet Classification # 3 Ozgur Ozturk CSE 581: Internet Technology Winter 2002 Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Introduction • Importance • Identify the context of packets Apply necessary actions • Differentiated services • Memory and Time Efficiency • Must handle Ks of rules • Must be at wire-speed (No queuing) Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Packet Classification # 3Paper List • T. Lakshman, D. Stiliadis, "High-Speed Policy-based Packet Forwarding Using Efficient Multi-dimensional Range Matching” [Bit-Parallelism] • http://www.bell-labs.com/user/stiliadi/filter/paper.html • F. Baboescu, G. Varghese, "Scalable Packet Classification” [ABV: Agregated Bit Vector] • M. Buddhikot, S. Suri, M. Waldvogel, "Space Decomposition Techniques for Fast Layer-4 Switching“ [Space Decomposition] • V. Srinivasan, G. Varghese, S. Suri, M. Waldvogel, "Fast and Scalable Layer Four Switching“ [Paper4] Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism Paper-Intro. • Presents packet classification schemes • traffic-independent and worst-case performance metric • a few K rules, at rates of M packets per second using range matches on more than 4 packet header fields Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism PaperRequirement for Real-Time Operation • Traditional router architectures • flow-cache architectures to classify packets • identified flows are expected to arrive in near future • Current backbone routers • active flows extremely high • OC-3 links, 256K flows • Cashes implemented as hash tables • scales well to that size Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism PaperRequirement for Real-Time Operation 2 - Hash-Table Prob.s • Good hash function is non-trivial • 100 to 200 bits of header to be randomly distributed to no more than 20 to 24 bits of hash index • header value distribution is unknown • Performance of cache-based schemes is heavily traffic dependent • Malicious Users • limitations of hashing algo. & cashing techniques • Packet queuing delays acceptable after classification Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism PaperPacket Classification Constraints • Scale to large routers with Gigabit links. • Process at wire-speed • 75% of packets < typical TCP packet size (552 bytes) • Nearly half are 40 to 44 bytes (TCP Ack) • Rules on several fields, specifying ranges, exact matches and prefixes • Two prefix fields in some cases • Allow arbitrary priorities for policies to allow distinction for multiple matches • Optimize for lookups, sacrifice update performance • lookup rate/update rate 107. Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism PaperPacket Classification Constraints-2 • Memory access time; dominant factor in worst-case lookup execution time • Amenable to hardware implementation • Time vs. Space Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism PaperGeneral Packet Classification • Decomposable search to perform multi-dimensional search for packet filtering • k-dimensional query a set of 1-dimensional queries on 1-dimensional intervals • Exploit parallelism where possible • Seek poly-logarithmic solution • Packet header fields k-dimensions • Filters overlapping regions in the k-dimensional space Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism PaperEfficiency of Proposed Algorithms • 1st Algorithm • Memory: k*n2O(n) bits per dimension • Time: log(2n)+1 • Memory access: n/w • 2nd Algorithm • Memory reduce to O(n log n) bits • Time increase constant • Can be optimized for time and memory budget • Exploit on-chip memory in traffic-independent manner, to speed up worst case. Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Notation • Rule rm in k dimentions • rm = (e1,m, e2,m,…. ek,m) • e range Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism PaperAlgorithm demo on 2-D/Preprocessing 1 Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism PaperAlgorithm demo on 2-D/Preprocessing 2 Max 2n+1 intervals for n rules Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism PaperAlgorithm demo on 2-D/Preprocessing 3 Sets of rules formed corresponding to each region Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism PaperAlgorithm demo on 2-D/Online 1 • P1 (x*,y*) to be classified • find intervals x* and y* belongs to • binary search log(2n+1)+1 comparisons/dimension • Create Intersection of all sets • conjunction of corresponding bit vectors • Highest Priority entry in the resultant bit vector Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism PaperAlgorithm demo on 2-D/Online 2 • Max Set Cardinality = O(n) • Intersection step examines all rules at least ones Time complexity = O(n) • With bit-level parallelism • The bitmaps representing sets stored in a (2n+1)*n array Bj[i,1..n] (Ri,j set stored for each dimension) • k*n/w memory accesses • Different processing elements for each dimension in hardware implementation • Prototype Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Different processing elements for each dimension in hardware implementation Prototype Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism Paper- Algorithm 2Packet Class. based on Inc. Reads • Algorithm utilizes incremental reads to reduce required memory • Allows time-space optimization and increases localization for off-chip SDRAM and wide on-chip memory implementations • Consider a specific dimension j • Assume maximum 2n+1 non-overlapping intervals • Corresponding to intervals in an n-bit bitmap with the positions of the 1s indicating the filter rules that overlap this interval • Adjacent intervals’ corresponding bitmaps differ in only one bit • A single bitmap and 2n pointers of size log n to the differing bits can be used to reconstruct any bitmap Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism Paper- Algorithm 2Packet Class. based on Inc. Reads 2 • Reduces space requirement to O(n log n) from O(n2) • Further Generalize • (2n+1)/l bitmaps instead of 1 • (2n+1)/2l pointers needed • Choose l by need • 2n+1 memory reduce to O(n log n) • Memory access increase n/w2n log n /w • Trade off decision according to on-chip/off-chip memory ratio. Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism Paper- Algorithm 2Special Case: 2-D Classification • Necessary for best-effort traffic aggregation in Internet backbone • Determine next hop and resource allocations based on destination and source addresses only • Longest prefix match lookups • Restrict source prefix ranges to powers of 2 in order to reduce space • space requirement O(n) with trie implementation • Virtual intervals • Map intervals of prefix lengths to both dimensions, sorted by length • “Virtual Intervals” allow worst-case lookup time of O(ls+log n) where ls is the number of possible prefix lengths • Multicast group identification requires only two additional memory accesses Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Bit-Parallelism Paper- Algorithm 2Conclusions • Packet classification, or filtering, is a useful primitive in connectionless networks to provide differentiated service and policy-based routing • More recently, security and active processing • Two multi-dimensional range matching algorithms allow millions of packets per second to be processed on a set of thousands of filter rules • Robust and predictable worst-case performance • Efficient 2-D algorithm for backbone routers with hundreds of thousands of routing entries • Algorithms demonstrate that there may be no need to restrict filtering to edge routers Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Paper4 Layer Four Switching • Traditional router performs looking-up based on destination address • Layer four switching provides increased flexibility: it gives a router the capability to distinguish and deal with traffics differently: • Block traffic from dangerous site • Provide QoS service for certain traffics • Give preferential treatment to certain traffic (say, database flow). • Difficulties: need layer four header information, which may not always available • any modification of layer four header may cause problems • Do not how to get header info when encrypted • Some variants of L4S: • Firewall • Reservation protocols such as RSVP • Routing based on traffic type, say web traffic
Paper4The Best Matching Filter Problem • A packet P has k distinct header fields for lookup: H[1], … , H[k] • The filter database of a Layer 4 Router consists of a finite set of filters: F1, F2, …, FN, each filter Fi has an associated directive acti • Match: each field of P matches the corresponding field of F • Cost: used to determine an unambiguous match (say order of filters) • An address range can always be transferred into a sequence of prefixes so we can use prefix match A filter database Dest Src DP SP SP M M M M T1 * Net * * * S * T0 Net * * 25 53 53 23 123 * * * * * * * 123 * * * * UDP * * UDP * TCP-ACK * A packet example: (M, S, UDP, 53, 125)
Paper4Set Pruning Trees (1) • Build a trie on the destination prefixes in the database • Each valid prefix in the destination trie points to a trie containing some source prefixes. • A single filter may be fit into multiple destination prefixes, thus has multiple source trie copies. • Memory space: O(N2) • Time complexity: O(N)
Set Pruning Trees (2) 0 1 Dest-Trie 0 0 Src-Trie 0 1 0 1 0 0 1 F3 F4 F3 E.g.: Looking for: (001, 001) 0 1 0 1 0 1 0 1 0 F6 0 F7 F2 F1 F5 F7 F2 F1 F7 F7
Avoid the Memory Blowup (1) • Avoid the copying by having each destination prefix D point to a source trie that stores the filters whose destination field is exactly D • When searching, may need go back to the destination trie for multiple times • Time complexity: O(W2) • Space complexity: O(NW)
Avoid the Memory Blowup (2) 0 1 Dest-Trie 0 0 1 0 1 0 1 E.g.: Looking for: (001, 001) F3 F4 1 1 0 F6 0 Src-Trie F5 F2 F1 F7 Memory requirement=O(NW) Lookup Worst Case= O(W2)
Improving Search Time: Basic Grid-of-Tries (1) • Basic idea: • Use pre-computation and switch pointers (in the lower lever tries) to speed up search in a later source trie base on the search in an earlier source trie. (Remember the previous searching result) • Role of switch pointer • Allow us to increase the length of the matching source prefix, without having to restart at the root of the next ancestor source trie. • Stored Filter: node (D,S) stores the least cost filter whose dest field is a prefix of D and src field is a prefix of S • Time complexity: 2W • Space complexity: O(NW)
Improving Search Time: Basic Grid-of-Tries (2) 0 1 Dest-Trie 0 0 0 1 0 0 1 0 1 E.g.: Looking for: (001, 001) x F3 F4 0 0 1 1 0 F6 0 Src-Trie y F5 F2 F1 F7
Further Improvement & Extension • Use some faster scheme for destination address matching • Time complexity O(W) O(log W) • Use multi-bit tries for source address matching • Time complexity O(W) O(W/k) • Extend Grid-of-tries to handle protocol and port fields • 3 GOT copies for TCP, UDP and OTHER respectively, • 4 hash tables for 4 port combinations: • both unspecified, destination only, source only, both specified
Cross-Producting (1) • How-to • Slice filter database into column, the i-th column storing all distinct prefixes in field i. • Make a cross-product table of all k columns • Pre-compute the least cost filter that matches each cross-product entry • When packet comes in, do best prefix matching for each field respectively • With matching results, find out the corresponding entry in the cross-product table • Discussion • Very fast (for matching) • Problem: memory explosion: N^k • Solution: On Demand Cross-Producting
Cross-Producting (2) Dest Src DP SP SP Dest Prefix Src Prefix DestPort Prefix SrcPort Prefix Flags Prefixes M M M M T1 * Net * * * S * T0 Net * * 25 53 53 23 123 * * * * * * * 123 * * * * UDP * * UDP * TCP-ACK * 123 Default M T1 Net Default S T0 Net Default 25 53 23 123 Default UDP TCP-ACK Default Num CrossProduct Matching Filter F1 F1 F1 F1 F1 F1 … F8 F8 1 2 3 4 5 6 … 479 480 M, S, 25, 123, UDP M, S, 25, 123, TCP-ACK M, S, 25, 123, default M, S, 25, default, UDP M, S, 25, default, TCP-ACK M, S, 25, default, default … … default,default,default,default,TCP-ACK default,default,default,default,default E.g. Looking for: (M,S,UDP,25,57)
Conclusions • GOT solution scalable (linear) storage & fast lookups for D-S filters. • More general filters high lookup cost • Cross-Producting solution, higher variance, but faster on average (for lookup) because of cashing need. • Hybrid scheme combines flexibility with efficiency. Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
ABV: "Scalable Packet Classification” F. Baboescu, G. Varghese, • GOAL • Packet classification • scalable (in rules, upto 100,000) • wire speed • Past Work • Linear time search • Linear amount of TCAMS • Lucent scheme • worst case doesn't scale Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
SOLUTION • Aggregated Bit Vector • improvement on Lucent bit vector • rule aggregation • rule rearrangement • Rule Aggregation • bit vectors are sparse • i.e., few rules match • Some compression scheme Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
SOLUTION continued • Rule Rearrangement • overlap is rare • place rules w/ common values together • sort out rule ordering later Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Comparing ABV w/ BV of Lucent Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Results • At least an order magnitude faster than BV • Scales well for memory access Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Paper # 3“Space Decomposition Techniques for Fast Layer-4 Switching" M. Buddhikot, S. Suri, M. Waldvogel • new scheme, based on space decomposition, whose search time is comparable to the best existing schemes, but which also offers fast worst-case filter update time. • three key ideas • innovative data-structure based on quadtrees for a hierarchical representation of the recursively decomposed search space • fractional cascading and precomputation to improve packet classification time • prefix partitioning to improve update time Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02
Space Decomposition Evaluation • Depending on the actual requirements of the system this algorithm is deployed in, a single parameter can be used to tradeoff search time for update time. • Amenable to fast software and hardware implementation. • For Ntwo-dimensional filters specified using prefixes of up to W bits in length, Area-based Quadtrees (AQT) data structure requires O(N)space, O(W) search time, and O((N)1/) • Both the average and worst-case search times and memory consumption are comparable or better than other schemes known in the literature. Packet Classification # 3 CSE 581: Internet Technology (Winter 2002) Ozgur Ozturk 02/11/02