1 / 22

A Hybrid Finite Automaton for Practical Deep Packet Inspection

A Hybrid Finite Automaton for Practical Deep Packet Inspection. CoNEXT 2007. Michela Becchi and Patrick Crowley. Context. Matching Engine and RegEx set. FTP.OPEN.* www.spyware Host= Server.*HTTP. Safe packets. Incoming packets. Hosxyz. blaBLAbla. Safe_payload. Safe_payload.

golda
Download Presentation

A Hybrid Finite Automaton for Practical Deep Packet Inspection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Hybrid Finite Automatonfor Practical Deep Packet Inspection CoNEXT 2007 Michela Becchi and Patrick Crowley

  2. Context Matching Engine and RegEx set FTP.OPEN.* www.spyware Host= Server.*HTTP Safe packets Incoming packets Hosxyz blaBLAbla Safe_payload Safe_payload • Deep packet inspection • Challenge: perform regular expression matching at line rate, given data-sets of hundreds (or thousands) of patterns • Processing time • Memory requirement xHost= Malicious packets ServerxHTTP

  3. a: 1-10 b c 1 2 3/1 a d b: 2-10 d b c d d 0 4 5 6/2 7/2 e c d e 8 9 10/3 c: 1,3,5-10 Deterministic vs. Non-Deterministic FA RegEx: (1) .*a+bc (2) .*bcd+ (3) .*cde a NFA c b 1 2 3/1 a * d b c d 0 6/2 4 5 DFA c d e 9/3 7 8 Text: d a b c d

  4. Memory-time tradeoff NFA time DFA • NFA • limited size • potentially NNFA states active in parallel • DFA • one state traversal/char • size: potentially 2N states where N=NNFA • In practical cases single DFA infeasible! • Idea • Hybrid automaton • Size comparable to NFA by preventing “state explosion” • Predictable and small memory bandwidth/processing time • Limit to classes of RegEx in Intrusion Detection Systems • Analyze state explosion scenarios memory

  5. SNORT Regular expressions pattern1.*pattern2.{n,m}[…]patternk[^cxcy]*[…]patternn • Examples • Server\s+Guptachar\s+\d+\x2E\d+ • User-Agent [^\r\n]*A-311\s+Server • Host[^\r\n]*wwp\.mirabilis\.com.*from=[^\r\n]*fromemail=[^\r\n]*subject=[^\r\n]*to=24962844 • \sPARTIAL.*BODY\.PEEK[^\n]{1024} • SNORT RegExs DO consist of: • Sequences of sub-patterns • Possibly containing (repetitions of) character ranges • Separated by dot-star terms and counting constraints • SNORT RegExs DON’T normally contain: • Nested repetitions • Disjunctions of complex sub-expressions

  6. 4 0 3 * * a b c d RegEx: ab.*cd NFA DFA 0 1 2 ^c a c ^c^d c a 0,2 4 0,2 3 0,1 0,2 b c d Dot-star terms • Definition • Unconstrained repetitions of wildcards (.*) or large ranges [^c1c2..ck]* • Examples • User-Agent[^\r\n]*ZC-Bridge • On single regular expressions (from practical data-sets) • NO state Blowup ^c

  7. 0 3 e a b c e c e [^cde] 2 1 a [^ceg] a 5 [^ce] d f a c c c e c a 8/1 e 4 e c e e e g a a e 6 [^cef] 7 f e h 9 [^ceh] e e g 10/2 [^ce] 11 h 12/2 Dot-star conditions (cont’d) [^ce] • Compiling together several RegEx • Duplication “sub-DFAs” at “.*” states • NO exponential blow-up • ab.*cd • efgh

  8. Counting constraints • Definition • Constrained repetition of wildcard .{n,m} or large ranges [^c1c2..ck]{n,m} • Examples • AUTH\s[^\n]{100} (buffer overflow) • Exponential state explosion: • Single regular expressions: all possible occurrences of the prefix in the counting constraint • Multiple regular expressions: additionally, all the possible occurrences of other RegEx in the counting constraint

  9. 1 a 2 b 3 * 4 * 5 * 6 c 7 d 8 Counting constraints (cont’d) NFA * DFA a 7 a a a d b ^a ^a ^a c 1 2 3 4 5 6 a a a c a Ex:ab.{3}cd [^ab] [^ab] 0 a a a 8 9 10 1 b b b a a 2 ^a [^ac] 11 13 3 12 10 a [^ac] a b a c c [^ad] c [^ad] 14 5 15 16 4 a [^abc] d d 4 a c 18 9 1 17 6

  10. 3 1 a * e e 11 * b c d 1 3 4/1 2 e b c d a a 4/1 c 2 c f c e a 6 8/2 7 e e * f c c a b a e e c 6 7 8/2 0 5 a 1 11 1 11 13 5 c 0 5 c a 9 c 10/3 d a d 10/3 e 9 c e c 1 5 11 1 11 2 11 b f f b a 11 12 13/4 12 13/4 1 e e a a c 1 5 11 1 5 11 First step: hybrid-FA • Idea: Stop subset construction at the state where state blowup would occur • Implication: hybrid-FA with a head-DFA, one or more tail-NFAs and one of more border-states Hybrid-FA NFA e

  11. a * e e 11 b c d 1 3 4/1 2 e 3 a a 1 c c f c e 6 7 8/2 e e c a b a e c 0 5 a 1 11 1 1113 5 c 9 c 10/3 a d e c c 1 5 11 1 11 2 11 f b a * 12 13/4 1 e e a a c 1 5 11 b c d 4/1 2 a * f c e 6 7 8/2 0 5 c a d 10/3 9 e b f 11 12 13/4 Hybrid-FA traversal NFA Hybrid-FA 1 5 11 • Functional equivalence (commonly reached accepting states) • Hybrid-FA: • Limitation in size of active vector till border state is reached • No back activation from tail-NFAs to head-DFA

  12. Improving the worst case • Size: Hybrid-FA ≈Size of NFA • Bandwidth: • Average case improved (in DFA) • Worst case dependent on tail-NFAs size • Can we do better?

  13. tail-NFA tail-DFA head-DFA tail-NFA tail-DFA head-DFA tail-NFA tail-DFA Dot-star terms: Tail-DFAs tail-NFA • Idea: • Problem: • Multiple border state traversals => Multiple tail-DFA activations • Fact: • In case of • sub_pattern1.* sub_pattern2 • sub_pattern1[^c1…ck] *sub_pattern2 w/ c1,..,ck sub_pattern2 subsequent activations of a tail-DFA can be safely ignored • Implication • Each tail-DFA adds only 1 to the worst case bound

  14. . . . * * * * suffix b b+1 b+n-1 b+n n states Counting Constraints: counter trick NFA for .{n}suffix • Observation: • n “counting states” do not carry real next state information • Idea: • Replace n counting states w/ auto-decrementing counter • At most 2 memory accesses per counter sufficient • Optimization • Counting constraint at the end of the regular expression (no suffix) => ONE counter is enough

  15. Rule-sets • Distinct PCREs: 982 • 25% w/ long counting constraints (generally at the end of the RegEx, n=100-1024) • 11.4 % containing .* terms • 54.89% containing [^c1c2..ck]* terms • Header-based grouping

  16. Memory storage requirements Tail-DFAs and counter trick used (counters at end)

  17. Memory bandwidth requirements • Simulations on 12 packet traces • From 17MB to 264 MB • 1-6 rules matched/traces • Observations: • active set size: # of parallel active states

  18. Conclusion • Contributions: • Analysis of practical rule-sets • Proposal of hybrid-FA to • reduce memory storage requirement • limit average case memory bandwidth • Refinements: tail-DFAs and counter tricks • bound worst case memory bandwidth • Experimental results: • Memory size: comparable to the corresponding NFA • Memory bandwidth: • Average case ≈ single (unfeasible) DFA • Worst case dependent upon number of “problematic” RegEx • Deployment observation • Head and tail-FAs independent • Hybrid-FA suitable for deployment on parallel architectures and FPGAs

  19. Thanks Questions?

  20. A SNORT rule HEADER MATCHING (protocol, source addr, source port, dest. addr, dest. port) alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"BACKDOOR a-311 death user-agent string detected"; flow:to_server,established; content:"User-Agent|3A|"; nocase; content:"A-311"; distance:0; nocase; content:"Server"; distance:0; nocase; pcre:"/^User-Agent\x3A[^\r\n]*A-311\s+Server/smi"; reference:url,www3.ca.com/securityadvisor/pest/pest.aspx?id=453076778; classtype:trojan-activity; sid:6396; rev:1;) PAYLOAD INSPECTION Keywords (“content”) Regular expression (PCRE)

  21. Problem • Network Intrusion Detection Systems use Regular Expression Matching for Payload Inspection • Regular Expression Matching performed in Linear time through deterministic finite automata (DFAs) • Several compression techniques put in place to reduce memory requirement of given DFAs BUT • Complexity of RegEx may make DFAs unfeasible because of “state explosion”. How to prevent state explosion from happening preserving worst case bound in memory bandwidth?

  22. b c 1 2 3/1 a * b c d 4 5 6/2 0 c d e a 7 8 9/3 0,4 2 0,1 b a c 0,7 0 0,4 Deterministic vs. Non-Deterministic FA RegEx: (1) .*abc; (2) .*bcd; (3) .*cde NFA DFA c b

More Related