Survey of Packet Classification Algorithms

Survey of Packet Classification Algorithms

Outline • Background and problem definition • Classification schemes • One dimensional classification • Two dimensional classification

Background

Flow-aware vs. Flow-unaware Routers • Flow-aware router • Keep track of flows and perform similar processing on packets in a flow • Flow-unaware router • Packet-by-packet router • Treat each incoming packet individually

Why Flow-aware Router? • Additional mechanisms required • Admission control, resource reservation, per-flow queueing, fair scheduling etc. • Provision of DiffService in ISPs • Capability to distinguish and isolate traffic belonging to different flows based onnegotiated service agreements Classification Rules or Policies

Need for DiffService • Service • Traffic shaping • Traffic filtering • Policy routing E1 Y E2 Z ISP3 NAP X ISP2 ISP1

More Valueadded Services • DiffService • Regard traffic from Autonomous System #33 as `platinumgrade’ • Accounting and billing • Treat all video traffic as highest priority and perform accounting for this type of traffic • Committed access rate (rate limiting) • Rate limit WWW traffic from subinterface#739 to 10Mbps

Control Routing, resource reservation, admission control Packet classification Special processing Switching Routing lookup Datapath: per-packet processing Scheduling Flow-aware Router-Basic Architectural Components

Predicate Action Flow Classification Forwarding Engine H E A D E R Flow Classification Flow Index Classifier (Policy Database) Incoming Packet

7 24 Class A 0 Network Host 14 16 Class B 10 Network Host 21 8 Class C 110 Network Host Every address was class A or B or C, easily determined by the first three bits of the address Classful Addresses

208.12.21/24 208.12.31/24 208.12.16/24 Total IPv4 address space 0 232-1 • An exception prefix 208.12.21/24 208.12.16/20 Total IPv4 address space 0 232-1 These addresses match both prefixes Classless InterDomain Routing (CIDR) • Prefix can be of arbitrary length • Prefix ranges

Table Growth of a Backbone Router From http://www.telstra.net/ops/bgptable.html

Prefix Length Distribution

Problem Definition-Packet Classification

Given a classifier C with N rules, Rj, 1 jN, where Rj consists of three entities • A regular expression Rj[i], 1 id, on each of the d header fields, • A number, pri(Rj), indicating the priority of the rule in the classifier, and • An action, referred to as action(Rj)

Classification is a Generalization of Lookup • Classifier = routing table • One-dimension (destination address) • Rule = routing table entry • Regular expression = prefix • Action = (next-hop-address, port) • Priority = prefix-length

Metrics for Classification Algorithms • Speed • Storage requirements • Low update time • Ability to handle large classifiers • Flexibility in implementation • Low preprocessing time • Scalability in the number of header fields • Flexibility in rule specification

One Dimensional Packet Classification –IP Address Lookup Algorithms

1 0 a d 1 1 0 0 1 0 0 1 c e 0 0 1 0 1 f g h i 0 b Binary Tries Prefixes a 0* b 01000* c 011* d 1* e 100* f 1100* g 1101* h 1110* i 1111*

1 1 0 3 2 a d 1 0 1 0 3 b c e 0 1 4 4 0 1 0 1 f g h i Path-Compressed Trie Prefixes a 0* b 01000* c 011* d 1* e 100* f 1100* g 1101* h 1110* i 1111* Legend: x indicates to inspect which bit

1 0 1 0 1 0 a1 0 1 0 1 0 1 c e d1 0 1 0 1 0 1 a3 f g h i 0 1 b a2 Disjoint-prefix Binary Trie Prefixes a 0* b 01000* c 011* d 1* e 100* f 1100* g 1101* h 1110* i 1111* • Leaf pushing • Disjoint prefixes do not overlap • No prefix is itself a prefix of another

00 11 01 10 a a d d 11 11 00 00 0 1 01 10 01 10 c c e f g h i 0 1 stride=1 stride=2 b Variable-stride Multibit Trie Prefixes a 0* b 01000* c 011* d 1* e 100* f 1100* g 1101* h 1110* i 1111* • Reduced number of memory accesses • Greater wasted space

Caching Addresses Slow Path Buffer Memory CPU Fast Path • Advantages • Increased average lookup performance • Disadvantages • Decreased locality in backbone traffic • Cache size • Cache management overhead • Hardware implementation difficult DMA DMA DMA Line Card Line Card Line Card Local Buffer Memory Local Buffer Memory Local Buffer Memory MAC MAC MAC

Hash-based Scheme • Store a hash table for each prefix length • Hash key is the prefix value and prefix length • Search scheme • Linear search on prefix lengths • Binary search on prefix lengths • Need to provide intermediate markers • Guide to more specific prefix • Need precomputation per marker • Avoid backtracking

1 0 a d 1 1 1 0 j 2 1 0 1 0 0 1 p c e 3 0 0 1 0 1 f g h i 4 0 0 5 b 0 6 1 k 7 Linear Search on Prefix Lengths Linear search on length Prefixes a 0* b 01000* c 011* d 1* e 100* f 1100* g 1101* h 1110* i 1111* j 01* k 1100001* p 101*

1 0 a d 1 1 1 0 j 2 1 0 1 0 0 1 p c e 3 0 0 1 0 1 4 f g h i 0 0 5 b 0 6 1 k 7 Binary Search on Prefix Lengths Binary search on length Prefixes a 0* b 01000* c 011* d 1* e 100* f 1100* g 1101* h 1110* i 1111* j 01* k 1100001* p 101*

Lookups with Ternary-CAM TCAM RAM Memory array Next-hop 0 0 1 1 0 2 0 3 Priority Destination Next-hop memory Address encoder 1 M

Advantages Suitable for multiple fields Fast: 16-20 ns (50-66 Mpps) Simple to understand Disadvantages Inflexible: range-to-prefix blowup Density: largest available in 2000 is 32K x 128 (but can be cascaded) Management software, and on-chip logic: non-trivial complexity Power: 5-8 W Incremental updates: slow DRAM-based CAMs: higher density but soft-error is a problem Cost: $30-$160 for 1Mb Lookups with Ternary-CAM

Two Dimensional Packet Classification

1 0 Dimension DA 0 0 R6 R4 R3 R7 R2 R1 R5 R7 R2 R1 R7 R7 Set-pruning Tries Dimension SA

Hierarchical Tries Dimension DA 1 0 0 0 Dimension SA R3 R4 R6 R5 R2 R1 R7

Grid-of-Tries 1 Dimension DA 0 0 0 0 Dimension SA R3 0 R4 0 R6 R7 R5 R2 R1

Grid-of-Tries – cont. 20K entries: 2MB, 9 memory accesses (with expansion) • Disadvantages • Static solution • Not easily extensible to more than two dimensions • Advantages • Good solution for two dimensions

0 1 R2 R1 1 R3 1 R4 1 1 0 R4 0 R3 R2 R1 Bitmap-intersection

Advantages • Good solution for multiple dimensions, for small classifiers • Disadvantages • Static solution • Large memory bandwidth (scales linearly in N) • Large amount of memory (scales quadratically in N) • Hardware-optimized Bitmap-intersection – cont. 512 rules: 1Mpps with single FPGA (33MHz) and five 1Mb SRAM chips

P2 P1 Cross-producting 6 5 R2 R1 R3 4 R4 (8,4) 3 (1,3) 2 1 1 2 3 4 5 6 7 8 9

Cross-producting – cont. Need: d 1-D lookups + 1 memory access, O(Nd) space 50 rules: 1.5MB, need caching (on-demand cross-producting) for bigger classifiers • Advantages • Fast accesses • Suitable for multiple fields • Disadvantages • Large amount of memory • Need caching for bigger classifiers (> 50 rules)

Survey of Packet Classification Algorithms

Survey of Packet Classification Algorithms

Presentation Transcript

Packet Classification

Algorithms for Classification:

Packet Classification

Classification Algorithms – Continued

Classification Algorithms

Packet Level Algorithms

Packet Classification # 3

Algorithms for Classification:

Dynamic Algorithms with Worst-case Performance for Packet Classification

Graph Algorithms: Classification

Algorithms for Advanced Packet Classification with TCAMs

Performance Analysis of Packet Classification Algorithms on Network Processors

Classification Algorithms – Continued

Classification Algorithms

Classification Algorithms

Algorithms for Classification:

Algorithms for Classification:

Performance Analysis of Packet Classification Algorithms on Network Processors