190 likes | 317 Views
Parallel tree search: An algorithmic approach for multi-field packet classification. Authors: Derek Pao and Cutson Liu. Publisher: Computer communications 2007 Present: Chen-Yu Lin ( 林呈俞 ) Date: Oct, 25, 2007. Outline. Introduction Packet classification algorithm
E N D
Parallel tree search: An algorithmic approach for multi-field packet classification Authors:Derek Pao and Cutson Liu. Publisher: Computer communications 2007 Present:Chen-Yu Lin (林呈俞) Date: Oct, 25, 2007
Outline • Introduction • Packet classification algorithm • Filter decomposition algorithm • Architecture of the search engine • Performance of the search engine
Introduction • Adopt the two-stage classification approach: • 1. Determine the best matching source and destination prefix pair. • 2. Determine the highest priority matching filter by comparing the remaining field against a short list of candidate rules in parallel. • Filter decomposition • Reduce the 2-D search problem to a pseudo 1-D problem. • The decomposed filters are organized as a height-balanced search tree. • This method is scalable to large filter databases and IPv6.
Packet classification algorithm • 2-stage classification approach • 1. Determine the best matching source-destination prefix pair. <ps, pd> • 2. Search the list of filters associated with the selected prefix pair for the highest priority filter • Convert the 2-D search problem to 1-D problem. • Source axis is partitioned into disjointsegments, called bands. • The decomposed filters are organized as a height-balanced search tree.
Packet classification algorithm • A decomposed filter is represented by a pair of end points. • Lower-left corner (LC) • Padding 0 to both source & destination prefixes. • Lower-right corner (RC) • Padding 0 to source prefix and padding 1 to destination prefix. • End point definition • <ps, ls, pd, ld, type> 5 – tuple • ps (pd): extended source (destination) prefix. • ls (ld): original length of source (destination) prefix. • type: type of end point. • Given a source-destination address pair, the search algorithm will • Determine the unique matching source band. • Find the best matching filter.
Packet classification algorithm Conflicting filters = <0010*, 0110*> = <00110*, 0110*> = <00111*, 0110*> 01100000 01101111
Packet classification algorithm If the LC- and RC-points of a filter are adjacent to each other in the sorted list, then the pair of points is replaced by a “merge” point, whose value is equal to LC- point.
Packet classification algorithm • Composite key store in a node is <ps, pd>, where the length of two prefixes be ls and ld. • An input address pair (S, D) is compared with the composed key.
Packet classification algorithm • External node • An external node provides the reference to the best matching filter, if any, when the search operation falls off the tree. • The role of the external node is similar to the pre-computed best matching prefix in the range search approach.
Filter decomposition algorithm • Filter decomposition algorithm • Extract all non-wild card source prefixes from the filter table and put them into a list. • A prefix whose length is less than the full address length is then replaced by its lower and higher end-point.
Filter decomposition algorithm • Let List of end-points regions source bands 00100000 L 00100000 ~ 00101111 0010* 00110000 L 00110000 ~ 00110111 00110* 00110111 H 00111000 ~ 00111111 00111* 00111111 H
Architecture of the search engine • To speed up the search operation is by pipelining • Nodes of the search tree are stored in separate memory modules according to their level number. • Linear pipeline can achieve optimal throughput • Drawback: Incremental updates cannot be supported. • Rebalanced a search tree may need to shift up to 75% nodes. IPv4 134 bits IPv6 330 bits
Architecture of the search engine • Flag • If left (right) flag is set left (right) pointer = left (right) external node. • Otherwise, left (right) pointer = left (right) subtree. • The amount of memory to support a 256K nodes search tree is around 4.2MB and 10.3MB for IPv4 and IPv6. • The depth of a height-balanced search tree with 256K nodes is no more than 19. • There are only 255 nodes in the first eight levels. • A tag is assigned to a task when it’s submitted to the search engine.
Architecture of the search engine Comparator + fast buffer (SRAM) 255 entries DRAM 32K entries
Architecture of the search engine • Output buffer of PUs • L1 and L2 are equipped with double output buffer. • Input buffer of Pus • L1 have a single slot buffer. • L2 equipped with a multiple slots dual-write port buffer (FIFO). • Assume the size of the L2 PUs input buffer is B slots. • If the number of concurrent tasks is larger than 2B+3 Deadlock happen. • To resolve the deadlock problem • Limiting the number of tags no more than 2B+3. • Implement a deflection routing scheme.
Performance of the search engine • Simulation • k and M = 8 • ACLX-5 with about 46K rules • Depth of search tree = 17 • L2 PU input buffer size = (B)
Performance of the search engine • Best performance • Select buffer size of B = 32. • Limiting the number of tags to 2B+3