Space-Time Tradeoffs in Software-Based Deep Packet Inspection

Space-Time Tradeoffs in Software-Based Deep Packet Inspection AnatBremler-Barr Yotam Harchol⋆David Hay IDC Herzliya, Israel Hebrew University, Israel . OWASP Israel 2011 (Was also presented in IEEE HPSR 2011) Parts of this work were supported by European Research Council (ERC) Starting Grant no. 259085 ⋆ Supported by the Check Point Institute for Information Security

Outline Motivation Background New Compression Techniques Experimental Results Conclusions

Network Intrusion Detection Systems • Classify packets according to: • Header fields:Source IP & port, destination IP & port, protocol, etc. • Packet payload (data) Internet IP packet Deep Packet Inspection Motivation

Deep Packet Inspection The environment: High Capacity Slow Memory Locality-based Low Capacity Fast Memory Cache Memory (D)RAM Motivation Motivation

Our Contributions Literature assumption: try to fit data structure in cache  Efforts to compress the data structures Our paper: Is it beneficial? • In reality, even in non-compressed implementation, most memory accesses are done to the cache BUT • One can attackthe non-compressed implementation by reducing its locality, getting it out of cache - and making it much slower! How to mitigate this attack? • Compress even further - our new techniques: 60% less memory Motivation

Complexity DoS Attack • Find a gap between average case and worst case • Engineer input that exploits this gap • Launch a Denial of Service attack on the system Real-Life Traffic Internet Throughput Motivation

Aho-Corasick Algorithm [Aho, Corasick; 1975] s0 • Build a Deterministic Finite Automaton • Traverse the DFA, byte by byte • Accepting state pattern found • Example:{E, BE, BD, BCD, CDBCAB, BCAA} E E C s0 C B C E B B s1 s2 s7 s2 C B C B D D C E B E B s3 s4 s5 s8 E C E s5 D D A B D E B s9 s13 s6 s9 C s6 C B A B C s14 s10 s10 E C A E s11 Input: s11 BCDBCAB B C E B s12 s12 Background

Aho-Corasick Algorithm [Aho, Corasick; 1975] • Naïveimplementation:Represent the transition functionin a table of |Σ|×|S| entries • Σ: alphabet • S: set of states • Lookup time: one memory accessper input symbol • Space: In reality: 70MB to gigabytes… Background

Potential Complexity DoS Attack s0 E C • Exhaustive Traversal Adversarial Traffic • Traverses as much states of the automaton • Bad locality - Bad for naïve implementation(will not utilize cache) B s1 s2 s7 D E D C s3 s4 s5 s8 D B A s9 s13 s6 A C s14 s10 A s11 B s12 Background

Alternative Implementation [Aho, Corasick; 1975] s0 • Failure transition goes to the state that matches the longest suffix of the input so far • Lookup time: at most two memory accesses per input symbol (via amortized analysis) • Space: at most, # of symbols in pattern set, depends on implementation Forward Transition E E C s0 C Failure Transition B C E B B s1 s2 s7 s1 s7 C B C B D D C E B E B s3 s4 s5 s8 E C E s5 D D A B D E B s9 s13 s6 B C C A B C s14 s10 s10 E C A E s11 B C E B s12 Background

Potential Complexity DoS Attack s0 E C • Exhaustive Traversal Adversarial Traffic • Traverses as much states of the automaton • Bad locality - Bad for naïve implementation(will not utilize cache) • Failure-path Traversal Adversarial Traffic • Traverses as much failure transitions • Bad for failure-path based automaton(as much memory accesses per input symbol) B s1 s2 s7 D E D C s3 s4 s5 s8 D B A s9 s13 s6 A C s14 s10 A s11 B s12 Background

Prior Work: Compress the State Representation s0 s0 E E C C B B s1 s1 s2 s2 s7 s7 D D E E D D C C failure: failure: match: match: s3 s4 s3 s4 s5 s5 s8 s8 size: D D B B A A Lookup Table Linear Encoded s9 s9 s13 s13 s6 s6 A A C C Bitmap: Can count bits usingpopcnt instruction Length=|Σ| s14 s14 s10 s10 A A failure: match: s11 s11 B B Bitmap Encoded s12 s12 Background Experimental Results Conclusions

Path Compression s0 s0 s0 • One-way branches can berepresented using a single state • Similarly to PATRICIA tries • Problem: Incoming failure transitions • Solution: Compress only states withno incoming failure transitions E C E E C C B B B s1 s2 s7 s1 s1 s2 s2 s7 s7 (B) D D C E D D D D C C E E s3 s4 s5 s8 s3 s3 s4 s4 s5 s5 s8 s8 (BC) D A BCAB D D A A B B s9' s13 s6 s9 s9 s13 s13 s6 s6 (BCA) A C C A A s14 s14 s14 s10 s10 (BCAB) 100% A A s11 s11 75% 2004 B B Tuck et al. Our PathCompression s12 s12 New Compression Techniques

Leaves Compression E* s0 s0 s0 E C C E* C • By definition, leaves have noforward transitions • Their single purpose is to indicatea match • We can push this indication up byadding a bit to each pointer • Then, leaves can be eliminated from theautomaton - by copying their failuretransition up B B B D* s1 s2 s2 s7 s7 s1 s2 s7 (B) (B) (B) E* D D E D C C D E* C D* s3 s4 s5 s5 s3 s4 s8' s8' s5 s8' A* (BC) (BC) (BC) D* BCAB* D BCAB A A A D* s13 s6 s9' BCAB* s13 s13 s6 s9' (BCA) A* A (BCA) s14 (BCA) s14 3% more space reduction Reduces number of transitions taken New Compression Techniques

Pointer Compression In Snort IDS pattern-set, 79% of the fail pointers point to states in depths 0, 1, 2 Add two bits to encode depth of pointer: 00: Depth 0 01: Depth 1 10: Depth 2 11: Depth 3 and deeper 16 bits pointer 2 bits Depth ≤ 2 16 bits pointer 2 bits 16 bits pointer Depth > 2 11 New Compression Techniques

Pointer Compression Determine next state from pointer depth: • 0: Go to root • 1: Use a lookup table using last symbol • 2: Use a hash table using last two symbols • ≥ 3: Use the stored pointer Depth 1 Lookup Table: Depth 2 Hash Table: Last 2 symbols 100% 75% 2004 hash table Tuck et al. 41% Our PathCompression Pointer Comp. Next state New Compression Techniques

Function Inlining • Compressed implementation makes more memory accesses • Initial implementation was based on a few functions calling each other • Avoiding function calls (by inlining their code) reduced total number of memory reads by 36% New Compression Techniques

Experimental Setup Test Systems Pattern-Sets Real-life traffic logs taken from MIT DARPA * We used only half of ClamAV signatures for our tests Experimental Results

Space Requirement 722.14 Memory Footprint [MB] 2.59 1.5 Experimental Results

Memory Accesses per Input Symbol Experimental Results

L1 Data Cache Miss Rate Intel Core 2 Duo (2 cores) 16KB L1 Data Cache 3MB L2 Cache L1 Data Cache Miss Rate Experimental Results

L2 Cache Miss Rate Intel Core 2 Duo (2 cores) 16KB L1 Data Cache 3MB L2 Cache Real-Life Traffic: 0.7% L2 Cache Miss Rate Adversarial Traffic: 23%L2 Cache Miss Rate L2 Cache Miss Rate Maximal L2 Miss Rate: 0.06% Experimental Results

Experimental Results Space vs. Time: Naïve Implementation -86% OurImplementation Experimental Results

Conclusions It is crucial to model the cache in software-based Deep Packet Inspection: • Naïve Aho-Corasick implementationhas a huge memory footprint, but works well on real-life traffic due to locality of reference • Naïve implementation can be easily attacked,making it 7 times slower, even though it has constant number of memory accesses We also show new compression techniques: • 60% less memory than best prior-art compression • Stable throughput, better performance under attacks Naïve Aho-Corasick implementation Conclusions

Questions? Thank you!

Space-Time Tradeoffs in Software-Based Deep Packet Inspection

Space-Time Tradeoffs in Software-Based Deep Packet Inspection

Presentation Transcript

Deep Packet Inspection Which Implementation Platform?

BotFinder : Finding Bots in Network Traffic Without Deep Packet Inspection

Deep packet inspection, technical configurations and privacy

Network Forensics Deep Packet Inspection

A Memory Efficient DFA based on Pattern Segmentation for Deep Packet Inspection

Cache-Based Scalable Deep Packet Inspection with Predictive Automaton

Space-for-Time Tradeoffs

Time-Space Tradeoffs in Resolution: Lower Bounds for Superlinear Space

Space-Time Tradeoffs in Software-based Deep Packet Inspection

Deep Packet Inspection with Regular Expression Matching

SWM: Simplified Wu- Manber for GPU-based Deep Packet Inspection

Deep Packet Inspection Market Segment to 2020

Space-time tradeoffs

Space-for-time tradeoffs

Theory of Algorithms: Space and Time Tradeoffs

Packet Scheduling for Deep Packet Inspection on Multi-Core Architectures

Deep Packet Inspection: Where are We? CCW’08

Time-Space Tradeoffs in Resolution: Superpolynomial Lower Bounds for Superlinear Space

Growth opportunities in Deep Packet Inspection and Processing Market

Deep Packet Inspection Using Parallel Bloom Filters

Theory of Algorithms: Space and Time Tradeoffs