210 likes | 349 Views
Memory Efficient Regular Expression Search Using State Merging. Michela Becchi Washington University in St. Louis Srihari Cadambi NEC Laboratories America. Matching Engine and RegEx set. Safe packets. Safe pay1. Safe pay2. Incoming packets. FTP.OPEN.* www.spyware Host=.*HTTP.
E N D
Memory Efficient Regular Expression Search Using State Merging Michela Becchi Washington University in St. Louis Srihari Cadambi NEC Laboratories America
Matching Engine and RegEx set Safe packets Safe pay1 Safe pay2 Incoming packets FTP.OPEN.* www.spyware Host=.*HTTP Hosxyz blaBLAb Malicious packets xHost= FTP.OPEN Context • Regular expression matching is a critical operation in networking • Intrusion detection • Context based billing • Peer-to-peer traffic detection and prioritization • Application level filtering • Challenge: perform regular expression matching at line rate • Processing time • Memory requirement (occupancy and bandwidth) Michela Becchi
Background • Two algorithmic solutions • Non deterministic finite automata (NFAs) • High time complexity • Compact representation • Deterministic finite automata (DFAs) • Low time complexity • Potentially exponential number of states w/ respect to NFAs • Multiple implementation approaches • FPGA [Sidhu FCCM 2001, Clark 2003] • Software [Paxson 1998, Roesh 1999, Tuck 2004] • Custom hardware [Kumar 2006] • Problem: given a DFA, how to compactly represent it without violating the processing time bound Michela Becchi
In this paper • New method to compact a DFA called state merging • Data structure to support state merging • Algorithm to perform state merging • Evaluation on real security rule-sets (from Bro and Snort NIDS) Michela Becchi
Outline • The idea • The algorithm • The data structure • Experimental evaluation Michela Becchi
Non-equivalent Automata! State Merging: the idea pattern: ((a[b-e][g-i])|(f[g-h]j))k+ 0 a a 1 a 3 1 [b-e] a [b-e] a .0 /0,1 a a a [g-i] f a f f a /0 [g-i] j k 3_4 5 0 6 k 0 5 6 k k /1 j a a a f f [g-h] .1 f f [g-h] f f /0,1 2 4 f 2 f f f Input text: acjk • common outgoing transitions are compressed • input labels keep 1-step history information • outgoing conditional transition ensure functional equivalence Michela Becchi
State Merging – selecting the states DFA pattern: ((a[b-e][g-i])|(f[g-h]j))k+ a a [b-e] 1 3 a [g-i] f a f a 0 k 5 6 k j a a f Space reduction graph f [g-h] 2 4 f 3 1 f f 6 5 0 4 2 • bold edge has weight 3 • remaining edges have weight 2 Michela Becchi
1 6 0 3_4 5 2 State Merging – selecting the states (cont’d) a DFA 1 a [b-e].0 a/0,1 a a f [g-i]/0 j/1 k 3_4 5 6 0 k a f [g-h].1 f f/0,1 f 2 Space reduction graph f State 1 and 2 have now one more target in common: merged state 3_4! State merging can create new merging opportunities. Michela Becchi
a.0 a.0 a.0/0,1, f.1/0,1 a.0 0 a.0, f.1 1_2 3_4 5 6 [b-e].0/0 [g-i]/0 j/1 k k [g-h].1/1 f.1 f.1 f.1 State Merging – selecting the states (cont’d) DFA • Key point: Labels can be reused • State merging stops when label overhead exceeds potential saving • Old and new DFA are functionally equivalent Michela Becchi
Outline • The idea • The algorithm • The data structure • Experimental evaluation Michela Becchi
0 … 0 1 1 1 1 1 1 0 0 0 0 0 ... 0 Bitmap a 1 a [b-e] 1 3 256 bits Pointer Indirection a [g-i] f 0 1 1 1 1 2 a f a 0 k 5 6 k Pointer Indirection + Label # 1 in bitmap 0 0 0 0 0 0 0 1 1 1 1 2 j a a f f [g-h] 2 4 f # 1 in bitmap f log2(distinct targets) Transition Table f 1 # distinct targets 3 log2(distinct targets)+log2(labels) 2 potential saving through state merging 32 bit A data structure to support state merging b 1 pattern: ((a[b-e][g-i])|(f[g-h]j))k+ 1 • Bitmap: • No replication of frequent transitions • Pointer indirection: • No pointer replication w/in a state • Character-transition target decoupling 3 Michela Becchi
0 … 0 1 1 1 1 1 1 0 0 0 … 0 1 0 0 … 0 1 0 0 0 0 1 1 1 0 … 0 0 b, 0 1 1 1 1 0 0 0 0 0 0 1 0 1 1 1 0 0 1 1 1_2 3_4 Data structure after state merging a.0 a.0/0,1 f.1/0,1 a.0 a.0 Saving: combined transition table Overhead: labels a.0, f.1 [b-e].0/0 [g-i]/0 j/1 k 0 1_2 3_4 5 6 k [g-h].1/1 f.1 f.1 f.1 Bitmap 0 Bitmap 1 1 1_2 Pointer Indirection + Label Pointer Indirection + Label Combined Transition Table 0 3_4 Michela Becchi
Outline • The idea • The algorithm • The data structure • Experimental evaluation Michela Becchi
State reduction 20x Michela Becchi
Transition reduction 1000x Michela Becchi
Memory requirement 25x Michela Becchi
Summary • Regular expression matching: critical operation in many networking applications • Two classical solutions: NFAs and DFAs • NFAs slow, DFAs fast but impractical • In this paper, we present a new method to compact a DFA called state merging • Data structure and fast algorithm to support state merging • Evaluation on real security rule-sets (from Bro and Snort NIDS) • 1000x reduction in number of transitions • 20x reduction in number of states • 25x memory reduction Michela Becchi
Questions? Michela Becchi
Experimental evaluation Michela Becchi
cj/0, cm/1 ck/0 S1,2 Sy Sw cn/1 Sz State Merging: the Idea Sx 0 ci c1 cj Sx Sy S1 ci/0, cl/1 c1.0 ck SW c2.1 Sx cl c2 cm Sy S2 cn 1 Sz • common outgoing transitions are compressed • input labels keep 1-step history information • outgoing conditional transition ensure functional equivalence Michela Becchi
0 … 0 1 1 1 1 1 1 0 0 0 0 0 ... 0 Bitmap a a [b-e] 1 3 256 bits Pointer Indirection a [g-i] f 0 1 1 1 1 2 a f a 0 k 5 6 k Pointer Indirection + Label Transition Table # 1 in bitmap 0 0 0 0 0 0 0 1 1 1 1 2 1 3 3 3 3 2 j a a f f [g-h] 2 4 f # 1 in bitmap # 1 in bitmap f log2(distinct targets) Transition Table f 1 # distinct targets 3 log2(distinct targets)+log2(labels) 2 potential saving through state merging 32 bit 32 bit A data structure to support state merging 1 pattern: ((a[b-e][g-i])|(f[g-h]j))k+ • Bitmap: • No replication of frequent transitions • Pointer indirection: • No pointer replication w/in a state • Character-transition target decoupling Michela Becchi