220 likes | 344 Views
A Memory-Efficient Parallel String Matching for Intrusion Detection Systems. HyunJin Kim, Hyejeong Hong, Hong- Sik Kim, and Sungho Kang, Member, IEEE. Outline. INTRODUCTION PROPOSED PARALLEL STRING MATCHING Architecture of String Matcher Gray Code-Based Sorting Bit Position Grouping
E N D
A Memory-Efficient Parallel String Matching forIntrusion Detection Systems HyunJin Kim, Hyejeong Hong, Hong-Sik Kim, and Sungho Kang, Member, IEEE
Outline • INTRODUCTION • PROPOSED PARALLEL STRING MATCHING • Architecture of String Matcher • Gray Code-Based Sorting • Bit Position Grouping • PERFORMANCE EVALUATION
INTRODUCTION • The DFA-based string matcher improves both regularity and scalability with lower time complexity [1]. • However, the memory requirements are proportional to the numbers of states and input symbols.
INTRODUCTION • In order to reduce the memory requirements for the DFAbased string matching, the bit-split string matching using Aho- Corasickalgorithm [2] was proposed in [3]. • The bit-split string matching partitions target patterns into subgroups with a list of the lexicographically sorted target patterns.
INTRODUCTION • Due to the biased bit transitions for each bit position group, the memory usage between FSM tiles in a string matcher could be unbalanced.
PROPOSED PARALLEL STRING MATCHING • The architecture of the string matcher is based on the string matching engine in [3], which is summarized as follows: • In a string matcher, each homogeneous FSM tile takes 𝑛 bits of one character (or one byte) as an input per cycle. • In a state of each FSM tile, pattern identifications are stored as a partial match vector (PMV), where the 𝑖−th bit represents whether the 𝑖−th pattern is matched or not in the state.
Architecture of String Matcher • Each state in an FSM tile has 2𝑛 pointers for the next state according to 𝑛-bit input. Therefore, the memory size of a string matcher is given by: • The main difference of the proposed string matcher from the string matching engine in [3] is that bits for an FSM tile input are selected among the input bits of one character (eight bits) using eight 8:1 multiplexers to support the bit position grouping.
Gray Code-Based Sorting • Target patterns are sorted based on BRGC values to reduce bit transitions between successive patterns. • When the character code values in the prefixes of target patterns are not evenly distributed, the effectiveness of the gray codebased sorting is restricted.
Bit Position Grouping • Let us assume that a string matcher has four FSM tiles with two input bits. In addition, “he,” “has,” “his,” and “hers” are assumed to be the patterns to be mapped. • For all string matchers in [3], a set of bit position groups for four FSM tiles is fixed as {(8, 7), (6, 5), (4, 3), (2, 1)}, where the number represents a bit position of one character from the LSB.
Bit Position Grouping • After grouping the MSB positions with other bits, an optimal set of bit position groups can be {(8, 4), (7, 3), (6, 5), (2, 1)}.
Bit Position Grouping • The bit position grouping for a string matcher has the constant time complexity of O (1). • When all target patterns to be mapped onto multiple string matchers, the time complexity can be O(𝑇 ). • The time complexity of pattern sorting can be O (𝑇 𝑙𝑜𝑔2𝑇 ).
Bit Position Grouping • However, due to the large constant factor of the bit position grouping complexity, if the number of target patterns 𝑇 is not sufficiently large, the pattern sorting will not be dominant.
PERFORMANCE EVALUATION • Target patterns were extracted from Snort v2.8 rules [4]. • Considering design analysis in [3], an FSM tile was assumed to take two bits of one character as an input.
PERFORMANCE EVALUATION • In Table I, the number of adopted string matchers was reduced on average by 4.44%, in comparison with the existing bit-split string matching in [3].
PERFORMANCE EVALUATION • For all patterns of Snort rule sets, total rule set with 7766 unique patterns was obtained, where the average number of characters in target patterns was 18.6. • The number of total unused states in all FSM tiles was reduced on average by 13.46%.
PERFORMANCE EVALUATION • When a string matcher did not adopt the fixed set of bit position groups, the proposed algorithm mapped more target patterns onto the string matcher than the method in [3].
PERFORMANCE EVALUATION • In Table III, the ratio of the string matchers that did not adopt the fixed set of bit position groups was up to 33.33%.
PERFORMANCE EVALUATION • Considering the performance enhancements, the proposed parallel string matching is useful for reducing memory costs without losing regularity and scalability of the string matching.