180 likes | 310 Views
A FPGA-based Parallel Architecture for Scalable High-Speed Packet Classification. Author: Weirong Jiang, Viktor K. Prasanna Publisher: 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors Presenter: Chin-Chung Pan Date: 2009/12/30. Outline.
E N D
A FPGA-based Parallel Architecture for Scalable High-Speed Packet Classification Author: Weirong Jiang, Viktor K. Prasanna Publisher: 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors Presenter: Chin-Chung Pan Date:2009/12/30
Outline • Introduction • Architecture and Algorithms • Motivations • Architecture Overview • Quadtree Search on Single Fields • Partitioning Algorithm • Performance Evaluation • Algorithm Evaluation • Implementation Results
Introduction • Most of those algorithms fall into three categories: • decision-tree-based (e.g. HyperCuts) • decomposition-based (e.g. BV, cross-producting) • partitioning-based (partition the original rule set into multiple subsets) • Based on the idea of the Independent Sets, we propose a coarse-grained independent sets algorithm toreduce the number of partitions at the cost of increasing the number of linear search. Such extra cost is alleviated by pipelining the search process in hardware.
Coarse-Grained Independent Sets • The original Independent Sets algorithm requires all the rules within an independent set must be mutually disjoint on the same field. • We propose a coarse-grained independent sets algorithm to reduce the number of independent sets effectively. • B is a design-time parameter controlling the granularity of the independent sets.
Motivations(2/2) • B is a design-time parameter controlling the granularity of the independent sets.
Architecture Overview(2/5) • Each single-field search returns the information associated with the primitive range that matches the value of the corresponding field of the input packet. • The outputs of the first stage include all information needed by the second stage. The search result from each field contains. • the IDs of the tables to look up. • the indices that are used for table lookup.
Architecture Overview(3/5) • For instance, an input packet with SA = 10001000 and DA = 01110111 will match the primitive ranges SA 011 and DA 010 on SA and DA fields, respectively.
Architecture Overview(4/5) • The information associated with SA 011 will include two sets of {table ID, index} tuples: {00, 01} and {10, null}. This is because SA 011 is within the “01”th independent interval of the “00”th coarse-grained independent set, as well as within the only primitive range on the SA field of the cross-product table.
Architecture Overview(5/5) • Similarly, the information associated with DA 010 is: {01, 00} and {10, 01}, since DA 010 is within the “00”th independent interval of the “01”th coarse-grained independent set as well as within the “01”th primitive range on the DA field of the cross-product table.
Partitioning Algorithm(1/2) R5 R6 R1 R5 R6 101 R10 R4 R10 100 011 DA R3 R9 R2 010 001 R7 R8 R7 R8 000 000 001 010 011 100 101 SA Prev Prev Curr Prev Curr Curr SA
Partitioning Algorithm(2/2) R1 101 R4 R4 100 011 DA DA R2 R9 R3 R9 R2 001 000 000 001 010 SA
Implementation Results(1/2) • We implemented our design (P = 4,B = 2) that supported the large rule set ACL_10K using Xilinx ISE 10.1 development tools.
Implementation Results(2/2) • Post place and route results show that the design sustains 90 Gbps throughput for minimum size (40 bytes) packets, which is more than twice the current backbone network link rate.