1 / 35

Performance Analysis of Packet Classification Algorithms on Network Processors

Performance Analysis of Packet Classification Algorithms on Network Processors. Deepa Srinivasan, IBM Corporation Wu-chang Feng, Portland State University November 18, 2004 IEEE Local Computer Networks. Network Processors. Emerging platform for high-speed packet processing

flowersw
Download Presentation

Performance Analysis of Packet Classification Algorithms on Network Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Analysis of Packet Classification Algorithms on Network Processors Deepa Srinivasan, IBM Corporation Wu-chang Feng, Portland State University November 18, 2004 IEEE Local Computer Networks

  2. Network Processors • Emerging platform for high-speed packet processing • Splice in a statistic here? • Provide device programmability while keeping performance • Architectures differ, but common features include… • Multiple processing units executing in parallel • Instruction set customized for network applications • Binary image pre-determined at compile time

  3. Example: Intel’s IXP

  4. IXP Architecture • Multi-processor • StrongARM core for slow-path processing • 6 microengines for fast-path processing • Hardware support for multi threading • Each microengine has 4 thread contexts • Zero or minimal overhead context switch

  5. Motivation for study • NPs offer a programmable, parallel alternative, but current packet processing algorithms are • Written for sequential execution or • Designed using custom, invariant ASICs • To use them on NPs • Algorithms must be mapped onto NPs in different ways with each mapping having varying performance

  6. Our study • Examine several mappings of a packet classification algorithm onto NP hardware • Identify general problems in performing such mappings

  7. Why packet classification? • Fundamental function performed by all network devices • Routers, switches, bridges, firewalls, IDS • Increasing complexity makes packet classification the bottleneck • Increase in size of rulesets • Increase in dimension of rulesets • Algorithms must perform at high-speed on the fast-path

  8. Picking an algorithm • Many algorithms sequential • Do not leverage inherent parallelism in NPs • Several parallel algorithms • BitVector [Lakshman98] • Parallel lookup implemented via FPGA • Maps well onto NP platform

  9. Bit Vector algorithm • T.V. Lakshman, D. Stiliadis, “High-speed policy-based packet forwarding using efficient multi-dimensional range matching”, SIGCOMM 1998. • Parallel search algorithm • Preprocessing phase • Two-stage classification phase • Perform lookup for each dimension in parallel • Combine results to determine matching rule

  10. Example ruleset Number of rules (N) = 4 Number of dimensions (d) = 3 Width of dimension (W) = 4 (bits)

  11. BitVector example Packet = {6, 10, 2} Matching rule = r2

  12. Recall: IXP has 6 μEngines Two design mappings • Consider multiple mappings of BitVector onto Intel’s IXP1200 microengines • Option 1: All processing for a single packet handled by one microengine (μEngine) - Parallel • Option 2: Processing for a single packet is split across μEngines - Pipelined

  13. Parallel Mapping

  14. Pipelined Mapping

  15. Memory allocation

  16. Evaluation platform • Intel IXP1200 Developer Workbench • Graphical IDE • Cycle-accurate simulator • Performance statistics • All experiments run within simulator • Configurable • Logging facility

  17. Simulator configuration • IXP1200 chip • 1K microstore • Core frequency (~ 165 MHz) • 4 ports receive data • Simulations run until 75000 packets received by IXP • Simulator sends packets as fast as possible • Rulesets used • Experiments use a small, fixed set of rules • Availability of real-world firewall rulesets limited

  18. Performance metrics

  19. Results and Analysis

  20. Throughput

  21. Packets sent/receive ratio

  22. Analysis • Overall, Parallel performs better than Pipelined • Pipelined : A single packet header in SDRAM is read multiple (3) times

  23. Microengine utilization

  24. Microengine aborted time

  25. Analysis • Aborted time is typically caused by branch instructions • Algorithms must reduce branch instructions to maximize throughput

  26. Microengine idle time

  27. Distribution of microengine time Parallel Pipelined

  28. Analysis • High microengine idle time in Pipelined due to memory latency • Lower microengine aborted time in Pipelineddue to what?

  29. Discussion • Pipelined mappings can bottleneck through memory • Repeated memory reads to send work from μEngine to μEngine • Direct hardware support for pipelining required • IXP2xxx = next-neighbor registers • Currently re-examining our results on IXP2400 • Algorithms with fewer branch instructions result in better microengine utilization (lower aborted time)

  30. Conclusion • Packet classification is a fundamental function • Parallel nature of NPs well-suited for parallel search algorithms

  31. Conclusion • Network processors offer high packet processing speed and programmability • Performance of an algorithm depends on the design mapping chosen • Contributions • Demonstrated that mapping has considerable impact on performance • Pipelined mappings benefit from hardware support • Algorithms with fewer branch instructions result in better processor utilization

  32. Future work • Analyze other mappings • Split work across different hardware threads in a single microengine • Placement of data structures in different memory banks • IXP2400 • Examine how hardware features change trade-offs in algorithm mapping • Algorithms designed specifically for network processors

  33. Backup Slides

  34. Definitions • Process of categorizing packets according to pre-defined rules • Classifieror ruleset: collection of rules • Dimension or field: packet header used • Rule: range of field values and action

  35. Packet classification algorithms N: number of rules d: number of dimensions W: maximum number of bits l : number of levels occupied by a FIS-tree

More Related