1 / 40

Efficient Memory Utilization on Network Processors for Deep Packet Inspection

Efficient Memory Utilization on Network Processors for Deep Packet Inspection. Piti Piyachon Yan Luo Electrical and Computer Engineering Department University of Massachusetts Lowell. Our Contributions. Study parallelism of a pattern matching algorithm

lynch
Download Presentation

Efficient Memory Utilization on Network Processors for Deep Packet Inspection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Memory Utilization on Network Processors for Deep Packet Inspection Piti Piyachon Yan Luo Electrical and Computer Engineering Department University of Massachusetts Lowell

  2. Our Contributions • Study parallelism of a pattern matching algorithm • Propose Bit-Byte Aho-Corasick Deterministic Finite Automata • Construct memory model to find optimal settings to minimize the memory usage of DFA U Mass Lowell

  3. DPI and Pattern Matching • Deep Packet Inspection • Inspect: packet header & payload • Detect: computer viruses, worms, spam, etc. • Network intrusion detection application: Bro, Snort, etc. • Pattern Matching requirements • Matching predefined multiple patterns (keywords, or strings) at the same time • Keywords can be any size. • Keywords can be anywhere in the payload of a packet. • Matching at line speed • Flexibility to accommodate new rule sets U Mass Lowell

  4. start state accept state accept state accept state accept state Classical Aho-Corasick (AC) DFA: example 1 • A set of keywords • {he, her, him, his} Failure edges back to state 1 are shown as dash line. Failure edges back to state 0 are not shown. U Mass Lowell

  5. Memory Matrix Model of AC DFA • Snort (Dec’05): 2733 keywords • 256 next state pointers • width = 15 bits • > 27,000 states • keyword-ID width = 2733 bits • 27538 x (2733 + 256 x 15) = 22 MB 22 MB is too big for on-chip RAM U Mass Lowell

  6. Bit-AC DFA (Tan-Sherwood’s Bit-Split) Need 8 bit-DFA U Mass Lowell

  7. Memory Matrix of Bit-AC DFA • Snort (Dec’05): 2733 keywords • 2 next state pointers • width = 9 bits • 361 states • keyword-ID width = 16 bits • 1368 DFA • 1368 x 361 x (16 + 2 x 9) = 2 MB U Mass Lowell

  8. Bit-AC DFA Techniques • Shrinking the width of keyword-ID • From 2733 to 16 bits • By dividing 2733 keywords into 171 subsets • Each subset has 16 keywords • Reducing next state pointers • From 256 to 2 pointers • By dividing each input byte into 1 bits • Need 8 bit-DFA • Extra benefits • The number of states (per DFA) reduces from ~27,000 to ~300 states. • The width of next state pointer reduces from 15 to 9 bits. • Memory • Reduced from 22 MB to 2 MB • The number of DFA = ? • With 171 subsets, each subset has 8 DFA. • Total DFA = 171 x 8 = 1,368 DFA What can we do better to reduce the memory usage? U Mass Lowell

  9. Classical AC DFA: example 2 28 states Failure edges are not shown. U Mass Lowell

  10. Byte-AC DFA • Considering 4 bytes at a time • 4 DFA • < 9 states / DFA • 256 next state pointers! Similar to Dharmapurikar-Lockwood’s JACK DFA, ANCS’05

  11. Bit-Byte-AC DFA • 4 bytes at a time • Each byte divides into bits. • 32 DFA (= 4 x 8) • < 6 states/DFA • 2 next state pointers U Mass Lowell

  12. Memory Matrix of Bit-Byte-AC DFA • Snort (Dec’05): 2733 keywords • 4 bytes at a time • < 36 states/DFA • 2 next state pointers • width = 6 bits • keyword-ID width = 3 bits • 29152 DFA (= 911 x 32) • 29152 x 36 x (3 + 2 x 6)= 1.9 MB • 1.9 MB is a little better than 2 MB. • This is because • It is not any optimal setting. • Each DFA has different number of states. • Don’t need to provide same size of memory matrix for every DFA. U Mass Lowell

  13. Bit-Byte-AC DFA Techniques • Still keeping the width of keyword-ID as low as Bit-DFA. • Still keeping next state pointers as small as Bit-DFA. • Reducing states per DFA by • Skipping bytes • Exploiting more shared states than Bit-DFA • Results of reducing states per DFA • from ~27,000 to 36 states • The width of next state pointer reduces from 15 to 6 bits. U Mass Lowell

  14. Construction of Bit-Byte AC DFA bit 3 of byte 0 4 bytes (considered) at a time U Mass Lowell

  15. Construction of Bit-Byte AC DFA 4 bytes (considered) at a time U Mass Lowell

  16. Construction of Bit-Byte AC DFA 4 bytes (considered) at a time U Mass Lowell

  17. Construction of Bit-Byte AC DFA 4 bytes (considered) at a time U Mass Lowell

  18. Construction of Bit-Byte AC DFA 4 bytes (considered) at a time U Mass Lowell

  19. Construction of Bit-Byte AC DFA 4 bytes (considered) at a time U Mass Lowell

  20. Construction of Bit-Byte AC DFA 4 bytes (considered) at a time U Mass Lowell

  21. Construction of Bit-Byte AC DFA 4 bytes (considered) at a time U Mass Lowell

  22. Construction of Bit-Byte AC DFA 4 bytes (considered) at a time U Mass Lowell

  23. Construction of Bit-Byte AC DFA Failure edges are not shown. U Mass Lowell

  24. Construction of Bit-Byte AC DFA U Mass Lowell

  25. Construction of Bit-Byte AC DFA 32 bit-byte DFA need to be constructed. U Mass Lowell

  26. Bit-Byte-DFA: Searching U Mass Lowell

  27. Bit-Byte-DFA: Searching 0 A failure edge is shown as necessary. U Mass Lowell

  28. Bit-Byte-DFA: Searching U Mass Lowell

  29. Bit-Byte-DFA: Searching 0 A failure edge is shown as necessary. U Mass Lowell

  30. Match=> (keyword) ‘memory’ Only all 32 bit-DFA find the match in their own! Bit-Byte-DFA: Searching U Mass Lowell

  31. Find the optimal settings to minimize memory • When k = keywords per subset • The width of keyword-ID = k bits • k = 1, 2, 3, … , K • when K = the number of keywords in the whole set. • Snort (Dec.2005) : K = 2733 keywords • b = bit(s) extracted for each byte • b = 1, 2, 4, 8 • # of next state pointers = 2b • The example 2: b = 1 • Beyond b > 8 • > 256 next state pointers • B = Bytes considered at a time • B = 1, 2, 3, … • The example 2: B = 4 • Total Memory (T) is a function of k, b, and B. • T = f(k, b, B) U Mass Lowell

  32. T’s Formula , and , when Total memory of all bit-ACs in all subset U Mass Lowell

  33. T_min at k=12 Find the optimal k • Each pair of (b, B) has one optimal k for a minimal T. U Mass Lowell keywords per subset

  34. Find the optimal b • Each setting of k, b, and B has different optimal point. • Choosing only the optimal setting to compare. • b = 2 is the best. U Mass Lowell keywords per subset

  35. Find the optimal B • b = 2 • T reduces while B increases. • Non-linearly • B > 16, • T begins to increase. • B = 16 is the best for Snort (Dec’05). U Mass Lowell keywords per subset

  36. Comparing with Existing Works • Tan-Sherwood’s, Brodie-Cytron-Taylor’s, and Ours • Our Bit-Byte DFA when B=16 • The optimal point at b=2 and k=12 • 272 KB • 14 % of 2001 KB (Tan’s) • 4 % of 6064 KB (Brodie’s) U Mass Lowell keywords per subset

  37. Comparing with Existing Works • Tan-Sherwood’s and Ours: At B = 1 • (Tan’s on ASIC) • 2001 KB • k = 16is not the optimal setting for B=1. • Each bit-DFA uses same storage’s capacity, which fits the largest one (worst case). • (Ours on NP) • 396 KB < 2001 KB • k = 3 is the optimal setting for B=1. • Each bit-DFA uses exactly memory space to hold it. U Mass Lowell keywords per subset

  38. Results with an NP Simulator • NePSim2 • An open source IXP24xx/28xx simulator • NP Architecture based on IXP2855 • 16 MicroEngines (MEs) • 512 KB • 1.4 GHz • Bit-Byte AC DFA: b=2, B=16, k=12 • T = 272 KB • 5 Gbps U Mass Lowell keywords per subset

  39. Conclusion • Bit-Byte DFA model can reduce memory usage up to 86%. • Implementing on NP uses on-chip memory more efficiently without wasting space, comparing to ASIC. • NP has flexibility to accommodate • The optimal setting of k, b, and B. • Different sizes of Bit-Byte DFA. • New rule sets in the future. • The optimal setting may change. • The performance (using a NP simulator) satisfies line speed up to 5 Gbps throughput. U Mass Lowell keywords per subset

  40. Thank you Question? Piti_Piyachon@student.uml.edu Yan_Luo@uml.edu U Mass Lowell

More Related