180 likes | 336 Views
Deep Packet Inspection with Regular Expression Matching. Min Chen, Danny Guo {michen, guo}@cs.ucr.edu CSE Dept, UC Riverside 03/14/2007. Outline. Motivation Background and challenges Evaluation metrics Algorithm comparison Implementation details Regular expression DFA & NFA
E N D
Deep Packet Inspection with Regular Expression Matching Min Chen, Danny Guo {michen, guo}@cs.ucr.edu CSE Dept, UC Riverside 03/14/2007
Outline • Motivation • Background and challenges • Evaluation metrics • Algorithm comparison • Implementation details • Regular expression • DFA & NFA • Detection engine • Result • Future work
Motivation • Aclass of packet processing applications need to inspect packets deeper than the protocol headers and analyze its payload • Network Security • HTTP load balancing • XML processing • Content-based billing and forwarding
Deep Packet Inspection (DPI) • Shallow packet inspection • Checks the header portion of a packet only • Deep packet inspection • A form of computer network packet filtering that examines the data part of a through-passing packet, searching for non-protocol compliance or predefined criteria to decide if the packet can pass
Challenges for DPI • Operates at wire speed • Large number of signatures (i.e. string patterns) • Patterns highly complex and have overlaps • Location of signatures is unknown
DPI Evaluation Metrics • Packet processing rate • Memory requirement • SRAM, DRAM, TCAM • Power consumption • TCAM • Scalability • The time to process new signatures and insert them into the system
DPI Algorithms • Fixed string matching • Parallel Boyer-Moore (BM) • Aho-Corasick Boyer-Moore (AC_BM) • Setwise Boyer-Moore-Horspool • Bloom Filter • CAM Based • Regular expression matching • Deterministic Finite Automation (DFA) • Non-deterministic Finite Automation (NFA)
Regular Expression (RE) • Expressive power and flexibility for describing useful patterns • Linux Application Protocol Classifier (L7-filter) • the Snort intrusion detection system (1131 out of 4867 rules using regular expressions as of February 2006)
Example of RE • “^(ymsg|ypns|yhoo).?.?.?.?.?.?.?[lwt].*\xc0\x80”
DFA Vs. NFA • Performance comparison • For 1 RE with length n • DFA • Higher processing speed • Acceptable construction time and memory consumption with lazy-DFA (DFA+NFA) • More efficient in software implementation
Detection Engine RE1 DFA RE2 DFA Input buffer RE3 DFA RE8 DFA Content Scanner 1 outgoing Streams Incoming Streams Content Scanner 2 Dispatcher … Content Scanner 16
Detection Engine Setup • # of Content Scanner (optimal) • SRAM 128bits (input) • Processing unit: 8bits/char • Processing power: 128/8 = 16 chars/cycle • # of REs for each Content Scanner • SRAM 128bits (output) • Processing unit: 1bit (accept:1 else:0) • # of streams: 16 (best throughput) • Each stream could be processed with 128/16=8 REs concurrently
Environment on Grep application • Input stream: 70MB file • RE: • For speed test: “[1-9]* [0-9]\.*[0-9]+” • For area test: “U\.?S\.?(D\.?)?[\ ]*(\$[\ ]*)?([0-9]+,[0-9]+,[0-9]+|[0-9]+\.[0-9]+\.[0-9]+|[0-9]+(\.[0-9]+)?[\ ]*milli?on)”
Result • Optimal throughput • 16 * 8bits * 200MHz = 25.6Gbps • Processing speedup • Logic consumption • 9% Slice Flip-fllop • 6% 4-input LUT
Future work • SNORT • More powerful application • Input stream preprocessing • TCP/IP packet • Packet arrival interval latency
Special thanks to John and Betul for the instruction on ISE and ROCCC