200 likes | 322 Views
Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching. Presenter: Junchen Jiang (Tsinghua University) Yang Xu (Polytechnic Institute of NYU) Tian Pan (Tsinghua University) Bin Liu (Tsinghua University). Email: livejc@gmail.com. Outline. Background Regular expression
E N D
Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching Presenter: Junchen Jiang (Tsinghua University) Yang Xu (Polytechnic Institute of NYU) Tian Pan (Tsinghua University) Bin Liu (Tsinghua University) Email: livejc@gmail.com
Outline • Background • Regular expression • DFA space explosion • Problem statement & Idea of pattern grouping • Pattern-Based DFA • Grouping algorithms • Results • Summary
Background (cont.) • RegEx (pattern) matching is now widely used • Network Intrusion Detection Systems (SNORT) • L7-filter: protocol identification • Example: ^220[\x09-\x0d -~]*ftp • Common Technique • Deterministic Finite Automaton (DFA) • Challenges • High memory requirement • Low processing speed
Background (cont.) • Space Problem – DFA state explosion • Exponential worst-case space complexity • Solution – Pattern Grouping • Example DFA DFA P3 P1 Two smaller DFAs Fast memories One big DFA Slow memory P4 P2 DFA P5 After partition patterns into two groups
Outline • Background • Problem Statement & Idea • Pattern-Based DFA & Pattern-Based Structure • Grouping Algorithms • Results • Summary
Problem Statement & Idea (cont.) • Minimize group number (speed) while greatly reduce DFA size (space) • Regex Set A • For General purpose processor architecture • Sequentially process all groups stored in one shared memory • For Multi-parallel processor architecture • Parallel processor for one group stored in individual memory • Challenge • Quantify the influence of each pattern!
Problem Statement & Idea (cont.) • Traditional Approach – Group patterns with little interactions together. • Pattern p and q have interaction iff DFA of p and q has a size larger than the total size of DFA of p and the one of q. • In our evaluation, only 23.6% pattern pairs in L7-filter and about only 5% pattern pairs have no interaction! • Interaction between patterns is not an accurate measurement for grouping patterns! • Our contribution • Add new specification to DFA structure by which we can quantify the influence of each pattern in the final DFA. • Based on new DFA structure, give more refined grouping algorithms
Problem Statement & Idea (cont.) • Why traditional DFA insufficient ? • Observation: No information of individual pattern is preserved in the resulting DFA (renumbered or not) • Pattern-based DFA (P-DFA) • Objective: Store information of each pattern in the states
Outline • Background • Problem Statement & Idea • Pattern-Based DFA & Pattern-Based Structure • Grouping Algorithms • Results • Summary
Pattern-Based DFA (P-DFA) (cont.) • Construction Traditional DFA P-DFA P1 P2 P3 P1 P2 P3 NFA NFA NFA NFA DFA DFA DFA Equivalent DFA P-DFA
Pattern-Based DFA (P-DFA) (cont.) • Each state in P-DFA contains some sub-states, each of which is derived from one RegEx pattern. • Example: state 0,3,6 (sub-state 0: P1, 3: P2, 6: P3) • Stored in Pattern-Based Structure (PBS) 1,3,8 ^a ^ax b ^b c DFA of P1 0,3,6 1,3,7 2,3,6 a b 0 1 2 a b x x b y ^x ^y DFA of P2 0,4,6 1,4,6 2,4,6 b 4 5 3 x y y y b a ^ac a 0,5,6 1,5,6 1,4,8 a DFA of P3 P-DFA of P1, P2, P3 7 8 6 y c a y 1,4,7 c ^ac
Pattern-Based DFA (P-DFA) (cont.) • Add pattern to P-DFA is trivial • Remove one pattern • remove sub-states + merge states • We can predict the size of P-DFA when any pattern is removed. 1,3,8 ^ax c 0,3,6 1,3,7 2,3,6 a a a b c 3,7 3,8 c x x b ^ax 4,8 a a 0,4,6 1,4,6 2,4,6 b 4,7 3,6 a ^ac a y y y b x a 4,6 5,6 y Remove P1: Remove all red numbers and merge identical states 0,5,6 1,5,6 1,4,8 y ^ay y y 1,4,7 c P-DFA of P2, P3 P-DFA of P1, P2, P3
Outline • Background • Problem Statement & Idea • Pattern-Based DFA • Grouping Algorithms • Results • Summary
Grouping Algorithms • General Scheme of pattern grouping using P-DFA. • Core idea: Get a P-DFA of all patterns first, then greedily subtract the pattern that maximizes the decrease of the size of P-DFA. Greedy pattern grouping algorithm Hardware Implementation (Matching) DFA RegEx Pattern #1 P-DFA #1 DFA PBS Software Operation (Combine, Delete) … … PBS RegEx Pattern #k DFA P-DFA #t PBS P-DFA
Grouping Algorithms • General Processor Architecture (Group1 ) • Generate the complete P-DFA • Repeat: split the current largest group in size into two small groups • Until the sum of all groups’ size is smaller than the given limit L. • Multi-parallel processor architecture (Group2) • For any group • If the size of its P-DFA is larger than the limit then • Extracts a pattern from the group so that the size of P-DFA is more closer to the limit L
Outline • Background • Problem Statement & Idea • Pattern-Based DFA • Grouping Algorithms • Results • Summary
Experimental Result (cont.) • Evaluation database: randomly select 300 RegEx patterns from Snort’s web pcre ruleset • General processor architecture
Experimental Result (cont.) • Multi-parallel processor architecture
Summary • RegEx pattern matching is challengeable • Elaborately grouping RegEx patterns to ease memory inflation • We present P-DFA, a new method to construct DFA • Quantify the influence of each pattern • Store information of each pattern in the state • Experiments show that our approach reduces almost half the number of groups in comparison with the traditional method.
Questions? Email: livejc@gmail.com