Memory-Efficient 5D Packet Classification At 40 Gbps

Memory-Efficient 5D Packet Classification At 40 Gbps Authors: Ioannis Papaefstathiou, and Vassilis Papaefstathiou Publisher: IEEE INFOCOM 2007 Presenter:Yu-Tso Chen Date:July, 09, 2008

Outline • Introduction • BSPC Design • Simulation Results

Introduction • Should support the current state-of-art networking speeds • (i.e. OC-768 at 40Gpbs) • Should not be prohibitively expensive • Decomposing multi-field classification rules into internal single-field rules • Combined using multi-level Bloom filters • B2PC uses the BOS single field searchingscheme[11]

BSPC Design • Observations • Filter sets’ sizes are small • Ranging from tens of filters to less than 5000 • Protocol field is restricted to a small set of values • TCP, UDP, and commonly used wildcards (covering more 95%) • Filter specify a limited number of unique transport port range

BSPC Design • Observations (cont.) • The number of single field filters matching a given packet is typically five or less • The number of single field values is significantly less than the number of overall filters

Overall Architecture of B2PC 256-entry directly index TreeBitmap IP_BLH1-2 PR_BLH1-2 HSH_TBLindex BLH1-4 12bit flowID TableII type

Hash Function IP_BLH1 = { SIP(6:11) xor DIP(0:5) , SIP(0:5) xor DIP(6:11) } IP_BLH2 = { SIP(0:5) xor DIP(6:11) , SIP(6:11) xor DIP(0:5) } PR_BLH1 = SPO xor (DPO<<2) PR_BLH2 = (SPO<<2) xor REV(DPO) BLH1 = (SIP>>4) xor REV(DIP>>2) xor (SPO<<4) xor (DPO>>3) xor (PRO<<3) BLH2 = SIP xor (DIP<<6) xor (SPO>>2) xor REV(DPO) xor PRO BLH3 = (SIP<<3) xor REV(DIP) xor REV(SPO) xor DPO xor (PRO<<6) BLH4 = REV(SIP) xor (DIP<<3) xor (SPO>>3) xor (DPO<<1) xor (PRO>>2) HSH_TBLindex = (BLH1,00) xor (00,BLH2>>4) xor (00,BLH3) xor (00,REV(BLH4))

Internally Represented Filters • In order to reduce the memory we take advantage of the fact that many rules share the same field values. • Side-effect of ID sharing scheme is that a certain internal-ID cannot be deleted unless all the rules employing it are deleted. • Keep a reference count for each internal ID • We need 4K 12-bit counters for each field • Total we need 5 x 4096 12-bit counters

Example Filter Set

Internal Representation of The Example Filter Set

Combining Results • Number of matching prefixes and the associated IDs • Prefix-based at most 33 matches each • Port fields may provide at most 17 matches • Protocol at most 2 matches • Match on the value itself or the wildcard

Incoming Packet Header Fields Decreasing length order Real-world less than five Totalperm= 3*2*1*3*2 = 36

Set Membership Queries with Bloom Filters • How to identify that a permutation belongs to the given set of rules • Sequential access to the rule table is slow • Efficiently represent a given ruleset and support quick set membership queries is need • The main advantage of Bloom Filters • Easily be implemented in hardware while supporting set-membership queries at high rates • Disadvantage of bloom filters • False-positive error

Set Membership Queries with Bloom Filters (cont.) • Parameters of bloom filters • Bit vector is 214 bits wide • The optimal number of hashing functions that set this vector is 4 • Theoretical false positive probability of 6.2% • Efficiently support incremental updates • Counting Bloom Filters

Flow ID Resolving • A match in a set-membership query we should first determine whether it is a false positive • To locate the FlowID we use HSH_TBL of 16k entries • Have the FlowID we access the RULE_TBL, and compare the stored IDs with the IDs of the current permutation • All IDs match, we have found the final result

Improving the Efficiency of Set Membership Query • To avoid these useless queries we have used two additional Bloom Filters • If they both provide a match then we query the “Main” Bloom Filter

Parallel Bloom Filter Queries

Number of References in Bloom Filters

Observed False Positives Rate

Hash Table Collisions (HSH_TBLindex)

B2PC Components Memory Requirements 36-bit wide memory word

Total Number of Average Memory Accesses in B2PC

Worst Case Network Performance of B2PC (40 Bytes packets)

Summary of Classification Schemes

Bloom Filter

Mask

Number of Memory Access in P2BC Data Structures

B2PC Silicon Cost

Memory-Efficient 5D Packet Classification At 40 Gbps

Memory-Efficient 5D Packet Classification At 40 Gbps

Presentation Transcript

Packet Classification

Packet Classification

EffiCuts: Optimizing Packet Classification for Memory and Throughput

Radiology Packet 40

SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification

Efficient Packet Classification with Digest Caches

Packet Classification # 3

Rule Hashing for Efficient Packet Classification in Network Intrusion Detection

A Memory-Efficient Hashing by Multi-Predicate Bloom Filters for Packet Classification

Efficient Multi-match Packet Classification with TCAM

Efficient Memory Utilization on Network Processors for Deep Packet Inspection

Efficient TCAM Encoding Schemes for Packet Classification using Gray Code

Efficient Multi-Match Packet Classification with TCAM

Multi-dimensional Packet Classification on FPGA 100 Gbps and Beyond

SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification

Efficient packet classification using TCAMs

Energy Efficient Packet Classification Hardware Accelerator

Efficient Memory Utilization on Network Processors for Deep Packet Inspection

Network Packet Broker Market by Bandwidth (1 and 10 Gbps, 40 Gbps, 100 Gbps) - 2023 | MarketsandMarkets