290 likes | 434 Views
A High Throughput String Matching Architecture for Intrusion Detection and Prevention. Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara. Outline. Why String Matching Matching against multiple strings The Aho-Corasick Algorithm The Devil in the Constants
E N D
A High ThroughputString Matching Architecturefor Intrusion Detection and Prevention Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara
Outline • Why String Matching • Matching against multiple strings • The Aho-Corasick Algorithm • The Devil in the Constants • A Bit-Split Algorithm • Hardware Design and Analysis • Conclusions
To Protect and Serve • Our machines are constantly under attack • Cannot rely on end users, we need networks which actively defend themselves. • IDS/IPS are promising ways of providing protection • Market for such systems: $918.9 million by the end of 2007. • Snort: a widely accepted open source IDS This requires the protection system to be able to operate at 10 to 40 Gb/s. (We aim at current and next generation networks.)
OurContributions • String Matching Architecture: • 0.4MB and 10Gbps for Snort rule set ( >10,000 characters) • Bit-Split String Matching Algorithm • Reduces out edges from 256 to 2. • Performance/area beats the best techniques we examined by a factor of 10 or more.
Scanning for Intrusions CodeRed worm: web flow established uricontent with “/root.exe” SoftwareIDS Scan Traffic In Traffic Out Most IDS define a set of rules. A string defines a suspicious transmission. We are not building a full IDS, rather building the primitives from which full systems can be built
Multiple String Matching C A A B D F C A B A B A B • The multiple string matching algorithm: • Input: A set of strings/patterns S, and a buffer b • Output: Every occurrence of an element of S in b • Extra constraint: b is really a stream • How to implement: Option 1) search for each string independently Option 2) combine strings together and search all at once A stringcan be anywhere in the payload of a packet. Input: Strings:
Why hardware • Snort: >1,000 rules, growing at 1 rule/day or more • Active research into automated rule building • Strings are not limited to be just [a-z]+ • We need a high speed string matching technique with stringent worst case performance. • Many algorithms are targeted for average case performance. Aho-Corasick can scan once and output all matches. But it is too big to be on-chip.
Outline • Why String Matching • Matching against multiple strings • The Aho-Corasick Algorithm • The Devil in the Constants • A Bit-Split Algorithm • Hardware Design and Analysis • Conclusions
The Aho-Corasick Algorithm • Given a finite set P of patterns, build a deterministic finite automaton G accepting the set of all patterns in P.
An AC Automaton Example Initial State h e s S Transition Function h S State h r s h e i h S Accepting State h i S h s S h h S 1 2 9 8 6 3 4 5 7 0 r h S • Example: P = {he, she, his, hers} • The Construction: linear time. • The search of all patterns in P: linear time (Edges pointing back to State 0 are not shown).
Linear Time: So what’s the problem 0 1 2 3 … 255 0 … 1 … 2 … 16,384 … • How to implement it on chip? 256 Next State Pointers <14> <14> <14> <14> <14> • Problem: Size too big to be on-chip • ~ 10,000 nodes • 256 out edges per node • Requires 16,384*256*14 = ~10MB • Solution: partition into small state machines • Less strings per machine • Less out edges per machine
Outline • Why String Matching • Matching against multiple strings • The Aho-Corasick Algorithm • The Devil in the Constants • A Bit-Split Algorithm • Hardware Design and Analysis • Conclusions
Our Main Idea: Bit-Split • Partition rules (P) into smaller sets (P0 to Pn) • Build AC state-machine for each subset • For each DFA Pi, rip state-machine apart into 8 tiny state-machines (Bi0 through Bi7) • Each of which searches for 1 bit in the 8 bit encoding of an input character • Only if all the different B machines agree can there actually a match
Binary Encoding P0 = { he, she, his, hers }
An example of Bit-Split 0001 0000 0000 0001 0000 0000 h 0 0 e s 1 1 0110 1000 0111 0011 s r e h i { } { } 0 s 7 1 4 6 8 9 2 5 3 P0 = { he, she, his, hers } P0 B03 b0 {0} 1 1 1 1 0 b1 { } 0 ,1 b2 { } 0 ,3 S h 0 S 1 h b3 0,1,2,6 0,3 b4{0,1,4} h S h i S 0 0 h 0 S b6{0,1,2,5,6} 1 h h S 0 b3{0,1,2,6} 1 r 0 1 b5{0,3,7,8} h S 1 b7{0,3,9} (Edges pointing back to State 0 are not shown).
Compact State Set 0 h e s 1 h e s r i 0 s 9 4 2 8 3 1 7 5 6 P0 = { he, she, his, hers } P0 B03 b0 { } 1 1 1 0 b1 { } b2 { } S h 0 S 1 h b4 { } h S h i S 0 0 h 0 S b6{2,5} 1 h h S 0 b3{2} 1 r 0 1 b5{7} h S 1 b7{9} (Edges pointing back to State 0 are not shown).
An example of Bit-Split b0 {} b0 {} 0 h e s 1 1 1 1 0 0 b1{} b1{} b2{} b2{} S 0 h 0 0 1 S 0 1 h r s e 1 h i 1 1 b4 {} b3 {} b7 {} 0 0 0 1 h 0 S 1 h i S b6{2,5} b5 {} 0 0 s 0 h 1 S b6{2,5} 1 0 h h S 1 1 5 9 2 7 8 3 6 4 1 0 0 1 b8{2,7} b3{2} b4{2} 1 r 1 0 h b5{7} 0 S b9{9} b7{9} P0 = { he, she, his, hers } P0 B03 B04 (Edges pointing back to State 0 are not shown).
Nice Properties • The number of states in Bij is rigorously bounded by the number of states in Pi • No exponential blow up in state • Linear construction time • Possible to traverse multiple edges at a time to multiply throughput
Matching on the example h e s 0 S h S e h s r h i h S h i S s h S h h S 4 3 6 1 5 9 2 7 8 r h S Input stream: h x h e r s Only scan the input stream once.
Matching on the example b0 {} b0 {} 0 h e s 1 1 1 1 0 0 b1{} b1{} b2{} b2{} S 0 h 0 0 1 S 0 1 h e r s 1 h i 1 1 b7 {} b4 {} b3 {} 0 0 0 1 h 0 S 1 h i S b6{2,5} b5 {} 0 0 s 0 h 1 S b6{2,5} 1 0 h h S 1 8 7 5 3 2 4 9 1 6 1 0 0 1 b8{2,7} b3{2} b4{2} 1 r 1 0 h b5{7} 0 S b9{9} b7{9} h x h e 0 1 0 0 1 1 1 0 P0 B03 B04 2 How do you “combine” the results from the different state machines? Only if all the state machines agree, is there actually a match.
How to Implement • The AC state machine is equivalent to the 8 tiny state machines. • The 8 tiny state machines can run independently, which means in parallel • Intersection done with bit-wise AND. • 8 is intuitive but not optimal • How to build a system to implement this algorithm? • Our algorithm makes it feasible to be on-chip
A Hardware Implementation 0 1 16 2 3 decoder … 4:1 Mux Rule Module 1 255 16 8 16 8 2 8 8 Rule Module N • A rule module is equivalent to an AC state machine • Rule modules, tiles are structurally equivalent • All full match vectors are concatenated to indicate which strings are matched • One tile stores one tiny bit-split state machine String Match Engine State Machine Tile Rule Module 0 Tile 0 Tile 3 ControlBlock Byte from Payload 4 Next State PointersPartial Match Vector 2-bit Input[0:1] Partial Match Vector [6:7] <8> <8> <8> <8> <16> [2:3] [4:5] Current State <8> Tile 1 Tile 2 Full Match Vector 8 16 Complete Set of Matches for All Rules … Input Output Latch Config Data 2 bits fromeach byte PartialMatchVector
An efficient Implementation h x h e h x h e h x e h h x h e 2 2 2 2 Tile 0 Tile 2 Tile 1 Tile 3
Performance of Hardware Key Metric: Throughput*Character/Area
Related Work • Software based • Good for ~100Mb/s, common case • FPGA-based • Many schemes map rules down to a specialized circuit • Near optimal utilization of hardware resources • Implementing state machines on block-RAMs [Cho and Mangione-Smith] • Concurrent to our work: mapping state machines to on-chip SRAM [Aldwairi et. al.] • Bloom filters [Dharmapurikar et al.] • Excellent filter in the common case • TCAM-based • Require all patterns to be shorter or equal to TCAM width • Cutting long patterns: 2Gbps with 295KB TCAM [Yu et. al.]
Conclusions • New Tile-based Architecture • 0.4MB and 10Gbps for Snort rule set ( >10,000 characters) • Possible to be used for other applications, e.g. IP lookups, packet classification. • New Bit-split Algorithm: • General purpose enough for many other applications, e.g. spam detection, peephole optimization, IP lookups, packet classification, etc. • Feasible to be implemented on other tile-based architecture.
An efficient Implementation h x h e h x h e h x e h h x h e 2 2 2 2 Tile 0 Tile 2 Tile 1 Tile 3