1 / 29

A High Throughput String Matching Architecture for Intrusion Detection and Prevention

A High Throughput String Matching Architecture for Intrusion Detection and Prevention. Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara. Outline. Why String Matching Matching against multiple strings The Aho-Corasick Algorithm The Devil in the Constants

Download Presentation

A High Throughput String Matching Architecture for Intrusion Detection and Prevention

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A High ThroughputString Matching Architecturefor Intrusion Detection and Prevention Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara

  2. Outline • Why String Matching • Matching against multiple strings • The Aho-Corasick Algorithm • The Devil in the Constants • A Bit-Split Algorithm • Hardware Design and Analysis • Conclusions

  3. To Protect and Serve • Our machines are constantly under attack • Cannot rely on end users, we need networks which actively defend themselves. • IDS/IPS are promising ways of providing protection • Market for such systems: $918.9 million by the end of 2007. • Snort: an widely accepted open source IDS This requires the protection system to be able to operate at 10 to 40 Gb/s. (We aim at current and next generation networks.)

  4. OurContributions • String Matching Architecture: • 0.4MB and 10Gbps for Snort rule set ( >10,000 characters) • Bit-Split String Matching Algorithm • Reduces out edges from 256 to 2. • Performance/area beats the best techniques we examined by a factor of 10 or more.

  5. Scanning for Intrusions CodeRed worm: web flow established uricontent with “/root.exe” SoftwareIDS Scan Traffic In Traffic Out Most IDS define a set of rules. A string defines a suspicious transmission. We are not building a full IDS, rather building the primitives from which full systems can be built

  6. Multiple String Matching C A A B D F C A B A B A B • The multiple string matching algorithm: • Input: A set of strings/patterns S, and a buffer b • Output: Every occurrence of an element of S in b • Extra constraint: b is really a stream • How to implement: Option 1) search for each string independently Option 2) combine strings together and search all at once A stringcan be anywhere in the payload of a packet. Input: Strings:

  7. Why hardware • Snort: >1,000 rules, growing at 1 rule/day or more • Active research into automated rule building • Strings are not limited to be just [a-z]+ • We need a high speed string matching technique with stringent worst case performance. • Many algorithms are targeted for average case performance. Aho-Corasick can scan once and output all matches. But it is too big to be on-chip.

  8. Outline • Why String Matching • Matching against multiple strings • The Aho-Corasick Algorithm • The Devil in the Constants • A Bit-Split Algorithm • Hardware Design and Analysis • Conclusions

  9. The Aho-Corasick Algorithm • Given a finite set P of patterns, build a deterministic finite automaton G accepting the set of all patterns in P.

  10. An AC Automaton Example Initial State h e s S Transition Function h S State h r s h e i h S Accepting State h i S h s S h h S 1 2 9 8 6 3 4 5 7 0 r h S • Example: P = {he, she, his, hers} • The Construction: linear time. • The search of all patterns in P: linear time (Edges pointing back to State 0 are not shown).

  11. Linear Time: So what’s the problem 0 1 2 3 … 255 0 … 1 … 2 … 16,384 … • How to implement it on chip? 256 Next State Pointers <14> <14> <14> <14> <14> • Problem: Size too big to be on-chip • ~ 10,000 nodes • 256 out edges per node • Requires 16,384*256*14 = ~10MB • Solution: partition into small state machines • Less strings per machine • Less out edges per machine

  12. Outline • Why String Matching • Matching against multiple strings • The Aho-Corasick Algorithm • The Devil in the Constants • A Bit-Split Algorithm • Hardware Design and Analysis • Conclusions

  13. Our Main Idea: Bit-Split • Partition rules (P) into smaller sets (P0 to Pn) • Build AC state-machine for each subset • For each DFA Pi, rip state-machine apart into 8 tiny state-machines (Bi0 through Bi7) • Each of which searches for 1 bit in the 8 bit encoding of an input character • Only if all the different B machines agree can there actually a match

  14. Binary Encoding P0 = { he, she, his, hers }

  15. An example of Bit-Split 0001 0000 0000 0001 0000 0000 h 0 0 e s 1 1 0110 1000 0111 0011 s r e h i { } { } 0 s 7 1 4 6 8 9 2 5 3 P0 = { he, she, his, hers } P0 B03 b0 {0} 1 1 1 1 0 b1 { } 0 ,1 b2 { } 0 ,3 S h 0 S 1 h b3 0,1,2,6 0,3 b4{0,1,4} h S h i S 0 0 h 0 S b6{0,1,2,5,6} 1 h h S 0 b3{0,1,2,6} 1 r 0 1 b5{0,3,7,8} h S 1 b7{0,3,9} (Edges pointing back to State 0 are not shown).

  16. Compact State Set 0 h e s 1 h e s r i 0 s 9 4 2 8 3 1 7 5 6 P0 = { he, she, his, hers } P0 B03 b0 { } 1 1 1 0 b1 { } b2 { } S h 0 S 1 h b4 { } h S h i S 0 0 h 0 S b6{2,5} 1 h h S 0 b3{2} 1 r 0 1 b5{7} h S 1 b7{9} (Edges pointing back to State 0 are not shown).

  17. An example of Bit-Split b0 {} b0 {} 0 h e s 1 1 1 1 0 0 b1{} b1{} b2{} b2{} S 0 h 0 0 1 S 0 1 h r s e 1 h i 1 1 b4 {} b3 {} b7 {} 0 0 0 1 h 0 S 1 h i S b6{2,5} b5 {} 0 0 s 0 h 1 S b6{2,5} 1 0 h h S 1 1 5 9 2 7 8 3 6 4 1 0 0 1 b8{2,7} b3{2} b4{2} 1 r 1 0 h b5{7} 0 S b9{9} b7{9} P0 = { he, she, his, hers } P0 B03 B04 (Edges pointing back to State 0 are not shown).

  18. Nice Properties • The number of states in Bij is rigorously bounded by the number of states in Pi • No exponential blow up in state • Linear construction time • Possible to traverse multiple edges at a time to multiply throughput

  19. Matching on the example h e s 0 S h S e h s r h i h S h i S s h S h h S 4 3 6 1 5 9 2 7 8 r h S Input stream: h x h e r s Only scan the input stream once.

  20. Matching on the example b0 {} b0 {} 0 h e s 1 1 1 1 0 0 b1{} b1{} b2{} b2{} S 0 h 0 0 1 S 0 1 h e r s 1 h i 1 1 b7 {} b4 {} b3 {} 0 0 0 1 h 0 S 1 h i S b6{2,5} b5 {} 0 0 s 0 h 1 S b6{2,5} 1 0 h h S 1 8 7 5 3 2 4 9 1 6 1 0 0 1 b8{2,7} b3{2} b4{2} 1 r 1 0 h b5{7} 0 S b9{9} b7{9} h x h e 0 1 0 0 1 1 1 0 P0 B03 B04 2 How do you “combine” the results from the different state machines? Only if all the state machines agree, is there actually a match.

  21. How to Implement • The AC state machine is equivalent to the 8 tiny state machines. • The 8 tiny state machines can run independently, which means in parallel • Intersection done with bit-wise AND. • 8 is intuitive but not optimal • How to build a system to implement this algorithm? • Our algorithm makes it feasible to be on-chip

  22. A Hardware Implementation 0 1 16 2 3 decoder … 4:1 Mux Rule Module 1 255 16 8 16 8 2 8 8 Rule Module N • A rule module is equivalent to an AC state machine • Rule modules, tiles are structurally equivalent • All full match vectors are concatenated to indicate which strings are matched • One tile stores one tiny bit-split state machine String Match Engine State Machine Tile Rule Module 0 Tile 0 Tile 3 ControlBlock Byte from Payload 4 Next State PointersPartial Match Vector 2-bit Input[0:1] Partial Match Vector [6:7] <8> <8> <8> <8> <16> [2:3] [4:5] Current State <8> Tile 1 Tile 2 Full Match Vector 8 16 Complete Set of Matches for All Rules … Input Output Latch Config Data 2 bits fromeach byte PartialMatchVector

  23. An efficient Implementation h x h e h x h e h x e h h x h e 2 2 2 2 Tile 0 Tile 2 Tile 1 Tile 3

  24. Performance of Hardware Key Metric: Throughput*Character/Area

  25. Related Work • Software based • Good for ~100Mb/s, common case • FPGA-based • Many schemes map rules down to a specialized circuit • Near optimal utilization of hardware resources • Implementing state machines on block-RAMs [Cho and Mangione-Smith] • Concurrent to our work: mapping state machines to on-chip SRAM [Aldwairi et. al.] • Bloom filters [Dharmapurikar et al.] • Excellent filter in the common case • TCAM-based • Require all patterns to be shorter or equal to TCAM width • Cutting long patterns: 2Gbps with 295KB TCAM [Yu et. al.]

  26. Conclusions • New Tile-based Architecture • 0.4MB and 10Gbps for Snort rule set ( >10,000 characters) • Possible to be used for other applications, e.g. IP lookups, packet classification. • New Bit-split Algorithm: • General purpose enough for many other applications, e.g. spam detection, peephole optimization, IP lookups, packet classification, etc. • Feasible to be implemented on other tile-based architecture.

  27. Thank you! Questions?

  28. Backup Slides

  29. An efficient Implementation h x h e h x h e h x e h h x h e 2 2 2 2 Tile 0 Tile 2 Tile 1 Tile 3

More Related