Memory Efficient Regular Expression Search Using State Merging

Memory Efficient Regular Expression Search Using State Merging Michela Becchi Washington University in St. Louis Srihari Cadambi NEC Laboratories America

Matching Engine and RegEx set Safe packets Safe pay1 Safe pay2 Incoming packets FTP.OPEN.* www.spyware Host=.*HTTP Hosxyz blaBLAb Malicious packets xHost= FTP.OPEN Context • Regular expression matching is a critical operation in networking • Intrusion detection • Context based billing • Peer-to-peer traffic detection and prioritization • Application level filtering • Challenge: perform regular expression matching at line rate • Processing time • Memory requirement (occupancy and bandwidth) Michela Becchi

Background • Two algorithmic solutions • Non deterministic finite automata (NFAs) • High time complexity • Compact representation • Deterministic finite automata (DFAs) • Low time complexity • Potentially exponential number of states w/ respect to NFAs • Multiple implementation approaches • FPGA [Sidhu FCCM 2001, Clark 2003] • Software [Paxson 1998, Roesh 1999, Tuck 2004] • Custom hardware [Kumar 2006] • Problem: given a DFA, how to compactly represent it without violating the processing time bound Michela Becchi

In this paper • New method to compact a DFA called state merging • Data structure to support state merging • Algorithm to perform state merging • Evaluation on real security rule-sets (from Bro and Snort NIDS) Michela Becchi

Outline • The idea • The algorithm • The data structure • Experimental evaluation Michela Becchi

Non-equivalent Automata! State Merging: the idea pattern: ((a[b-e][g-i])|(f[g-h]j))k+ 0 a a 1 a 3 1 [b-e] a [b-e] a .0 /0,1 a a a [g-i] f a f f a /0 [g-i] j k 3_4 5 0 6 k 0 5 6 k k /1 j a a a f f [g-h] .1 f f [g-h] f f /0,1 2 4 f 2 f f f Input text: acjk • common outgoing transitions are compressed • input labels keep 1-step history information • outgoing conditional transition ensure functional equivalence Michela Becchi

State Merging – selecting the states DFA pattern: ((a[b-e][g-i])|(f[g-h]j))k+ a a [b-e] 1 3 a [g-i] f a f a 0 k 5 6 k j a a f Space reduction graph f [g-h] 2 4 f 3 1 f f 6 5 0 4 2 • bold edge has weight 3 • remaining edges have weight 2 Michela Becchi

1 6 0 3_4 5 2 State Merging – selecting the states (cont’d) a DFA 1 a [b-e].0 a/0,1 a a f [g-i]/0 j/1 k 3_4 5 6 0 k a f [g-h].1 f f/0,1 f 2 Space reduction graph f State 1 and 2 have now one more target in common: merged state 3_4! State merging can create new merging opportunities. Michela Becchi

a.0 a.0 a.0/0,1, f.1/0,1 a.0 0 a.0, f.1 1_2 3_4 5 6 [b-e].0/0 [g-i]/0 j/1 k k [g-h].1/1 f.1 f.1 f.1 State Merging – selecting the states (cont’d) DFA • Key point: Labels can be reused • State merging stops when label overhead exceeds potential saving • Old and new DFA are functionally equivalent Michela Becchi

0 … 0 1 1 1 1 1 1 0 0 0 0 0 ... 0 Bitmap a 1 a [b-e] 1 3 256 bits Pointer Indirection a [g-i] f 0 1 1 1 1 2 a f a 0 k 5 6 k Pointer Indirection + Label # 1 in bitmap 0 0 0 0 0 0 0 1 1 1 1 2 j a a f f [g-h] 2 4 f # 1 in bitmap f log2(distinct targets) Transition Table f 1 # distinct targets 3 log2(distinct targets)+log2(labels) 2 potential saving through state merging 32 bit A data structure to support state merging b 1 pattern: ((a[b-e][g-i])|(f[g-h]j))k+ 1 • Bitmap: • No replication of frequent transitions • Pointer indirection: • No pointer replication w/in a state • Character-transition target decoupling 3 Michela Becchi

0 … 0 1 1 1 1 1 1 0 0 0 … 0 1 0 0 … 0 1 0 0 0 0 1 1 1 0 … 0 0 b, 0 1 1 1 1 0 0 0 0 0 0 1 0 1 1 1 0 0 1 1 1_2 3_4 Data structure after state merging a.0 a.0/0,1 f.1/0,1 a.0 a.0 Saving: combined transition table Overhead: labels a.0, f.1 [b-e].0/0 [g-i]/0 j/1 k 0 1_2 3_4 5 6 k [g-h].1/1 f.1 f.1 f.1 Bitmap 0 Bitmap 1 1 1_2 Pointer Indirection + Label Pointer Indirection + Label Combined Transition Table 0 3_4 Michela Becchi

State reduction 20x Michela Becchi

Transition reduction 1000x Michela Becchi

Memory requirement 25x Michela Becchi

Summary • Regular expression matching: critical operation in many networking applications • Two classical solutions: NFAs and DFAs • NFAs slow, DFAs fast but impractical • In this paper, we present a new method to compact a DFA called state merging • Data structure and fast algorithm to support state merging • Evaluation on real security rule-sets (from Bro and Snort NIDS) • 1000x reduction in number of transitions • 20x reduction in number of states • 25x memory reduction Michela Becchi

Questions? Michela Becchi

Experimental evaluation Michela Becchi

cj/0, cm/1 ck/0 S1,2 Sy Sw cn/1 Sz State Merging: the Idea Sx 0 ci c1 cj Sx Sy S1 ci/0, cl/1 c1.0 ck SW c2.1 Sx cl c2 cm Sy S2 cn 1 Sz • common outgoing transitions are compressed • input labels keep 1-step history information • outgoing conditional transition ensure functional equivalence Michela Becchi

0 … 0 1 1 1 1 1 1 0 0 0 0 0 ... 0 Bitmap a a [b-e] 1 3 256 bits Pointer Indirection a [g-i] f 0 1 1 1 1 2 a f a 0 k 5 6 k Pointer Indirection + Label Transition Table # 1 in bitmap 0 0 0 0 0 0 0 1 1 1 1 2 1 3 3 3 3 2 j a a f f [g-h] 2 4 f # 1 in bitmap # 1 in bitmap f log2(distinct targets) Transition Table f 1 # distinct targets 3 log2(distinct targets)+log2(labels) 2 potential saving through state merging 32 bit 32 bit A data structure to support state merging 1 pattern: ((a[b-e][g-i])|(f[g-h]j))k+ • Bitmap: • No replication of frequent transitions • Pointer indirection: • No pointer replication w/in a state • Character-transition target decoupling Michela Becchi

Memory Efficient Regular Expression Search Using State Merging

Memory Efficient Regular Expression Search Using State Merging

Presentation Transcript

Memory-Efficient Regular Expression Search Using State Merging

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Regular Expression

Regular Expression

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

^Regular Expression$

Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching

A Regular Expression Matching Algorithm Using Transition Merging

Regular Expression

Series DFA for Memory-Efficient Regular Expression Matching

Regular Expression

Regular Expression

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Regular Expression

Regular Expression