320 likes | 498 Views
FPGA Based String Matching for Network Processing Applications. Janardhan Singaraju , John A. Chandy. ENGG*3050 RCS Winter 2014 March 24, 2014. Presented by: Justin Riseborough Albert Tirtariyadi. Content. Introduction String Lookup Cache Architectures System Interaction
E N D
FPGA Based String Matching for Network Processing Applications JanardhanSingaraju, John A. Chandy ENGG*3050 RCS Winter 2014 March 24, 2014 Presented by: Justin Riseborough Albert Tirtariyadi
Content • Introduction • String Lookup Cache • Architectures • System Interaction • Systems comparison • Network Intrusion Detection • Architectures • System Interaction • Implementations • Critique
Keywords • Network processing • String matching • Content Addressable Memory (CAM) & Cache • Bottlenecks • Fixed-Size/Non-Fixed-Size keys • Cascading, propagating • Parallelism
Introduction • String matching are used in search engines, and network intrusion detection • Network processing applications require frequent string matching for specific keywords • As networks gets faster, it becomes more difficult for GPP to keep up • Bottlenecks are found in memory and also in slow implementation algorithms/methods
Software Algorithms Hardware Implementation • Rabin-Karp • Compares hashes of inputs instead of direct character matching • Knuth-Morris-Pratt • Character by character matching; skips non-matching • Boyer-Moore • Uses pre-computed functions to determine shifting distance • Finite automata methods • Translates finite automata graphs to FPGA circuitry • CAMs • Caches and lookup tables • Cellular automata • Finite state machines Current Implementations
Section I String Lookup Cache
String Lookup Cache • Hardware implementation based on CAMs, cellular automaton and caching • Caches retain frequently used values, reducing the need to constantly look up address values • Compatible with parallel processing, prefix sharing and pattern partitioning • Very high throughputs with low area overhead • Drawback of CAMs and hardware caches is the reliance on fixed-size keys • Implementations for non-fixed-size keys requires additional overhead
Content Addressable Memory • Hardware implementation of 2D [associative] arrays/ADT • In VLSI, the cells are transistors • In an FPGA, storage cells are registers, comparators are XOR gates
CAM as Character Match Array (CMA) • Takes characters from the network processor on successive clock cycles • Columns corresponds to a character in keyword • Input character is applied simultaneously to all n columns • Column match signal becomes high if all input bits matches • Storage cell used to indicate end of keyword
Processor Element (PE) Array • An array of finite state machines that carries out the approximate match algorithm • May contain multiple keywords from the CAM • Takes the match signals from the CAM and sets a PE flag which are forwarded to subsequent PEs • Evaluates entire input strings in linear time relative to the size of the input stream
Map Table and Outputs • The map table takes the PE# and outputs the address to the value or an indirect pointer to the value object • The map table has as many slots as there are PEs • If words are too long, it can cause holes in the map table
Section II Network Intrusion detection
Network Intrusion Detection • The process of identifying and analyzing packets that may contain threats to the organization’s network • Time consuming process that grows quickly as defined rule-set or signatures grows large • String matching is the most computationally intensive part of the intrusion detection • Every incoming packet is compared against several pre-defined signatures
Problems in the CAM Architecture • CAM-based designs cannot easily handle regular expressions • NIDs signatures are not of a fixed-size • (ie. CAM contains FOO and BAR, input stream is AFOOBARCD. In a 3-character size setup, the comparisons will be made against AFO, OBA and RCD; none of these will match and will slip right through the detection system) • CAM arrays are very large in area
Proposed Solution • Use discrete comparators instead of CAMs • Sacrifices the ability to update signatures dynamically; a fair tradeoff as signatures change relatively infrequently • Use p-rows of comparators for parallelism to match several characters in one clock cycle • Remove the aligned keyword approach as incoming streams may not be aligned to a certain size boundary
Processor Element Flow • Start at the beginning of the signature • Based on previous PE and current PE • If previous signal and current signal is a match, propagate match signal until end of signature • At the end of the signature, if entire signature match, flag the sig_match output
Signature Match Processor Example • Input string ‘144’ performed over 2 clock cycles • ‘1’ is checked in first cycle, sets off a match signal into the SMA • ‘4’ is checked in second cycle, sets off match signal into the SMA • Match signal for ‘1’ is present from previous clock cycle
Signature Match Processor Example • The ‘4’ is duplicated, so it simply propagates the first match signal to the second as a carry • Since this is the end of the signature, the output is a match due to the propagated match signals && sig_end
Address Output Logic • In order for the SMP to be useful, we also need to know which signatures caused the match • This is handled by the word match buffer, which maintains the position of the signature match • When the last character being processed has been reached, the match address output logic begins working on the buffer entries
Address Output Logic • A binary tree is used for the matching signatures • Decoding starts, and a signal is sent to the control circuitry stating there are matches • A pointer then propagates up the tree, generating a bit of the final address based on matches • Binary trees are fast and efficient, time to process is ~M cycles where M is the number of matches
FPGA Implementation • As parallelism increases, throughput increases, frequency decreases due to complexity • As characters increases, area increases, frequency decreases and throughput decreases
Critique • New terms and unknown works referred to • Difficult to follow in some areas due to inconsistencies and how the topic is presented • Lots of procedure / methodology on implementation • Very detailed works • Good examples to strengthen theoretical explanations • Implementation data given for comparison purposes
References • All figures and information used in this presentation pulled from the article • JanardhanSingaraju, John A. Chandy*, FPGA Based String Matching For Network Processing, ScienceDirect Microprocessors and Microsystems, December 14, 2007