180 likes | 285 Views
Accelerating Multi-Patterns Matching on Compressed HTTP Traffic. Authors: Anat Bremler-Barr , Yaron Koral Presenter: Chia-Ming ,Chang Date: 2009.9.1 Publisher/Conf. : IEEE INFOCOM 2009 April 2009 Page(s):397 - 405 .
E N D
Accelerating Multi-Patterns Matching on Compressed HTTP Traffic Authors: Anat Bremler-Barr, Yaron Koral Presenter: Chia-Ming ,Chang Date: 2009.9.1 Publisher/Conf. : IEEE INFOCOM 2009 April 2009 Page(s):397 - 405 Dept. of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
Outline • Introduction • Background • Naive Decompression with Aho-CorasickV.S Compressed HTTP based Aho-Corasick • Experiment • Conclusions
Introduction • HTTP compression, also known as content encoding, is a publicly defined way to compress textual content transferred from web servers to browsers. • This standards-based method of delivering compressed content is built into HTTP 1.1, and most modern browsers that support HTTP 1.1 support GZIP compression .
Introduction GZIP compression ( encode) (一) LZ77 Compression LZ77 compression technique is that we can compress a series of bytes (characters) if we spot that this series of bytes has already appeared in the past. For example: the text: ”abcdefabcd”, will be compressed to: ”abcdef(6,4)”, i.e., return 6 bytes and copy 4 bytes from that point. (二) Huffman Coding In HTTP, Huffman encodes uncompressed bytes and pointers (i.e., as numbers).
Introduction GZIP compression ( decode) • 1) Remove the HTTP header and store the Huffman dictionary of the specific session in memory. Note that different HTTP sessions would have different Huffman dictionaries. • 2) Decode the Huffman mapping of each symbol to the original byte or pointer representation using the specific Huffman dictionary table. • 3) Decode the LZ77 part. • 4) Perform multi-patterns matching on the uncompressed traffic.
Background LZ77 Compression we encode this series of bytes (denoted by repeated string) by the pair (distance,length) (1) distanceis a number between 1-32768 (32KB) indicates the distance in bytes of the repeated string (2) lengthis a number between 3-258 indicates the length of the string in bytes. U – uncompressed traffic
Naive Decompression with Aho-Corasick pattern matching Pattern set ={ abcd , nba, nbc} distance length u =>unmatch m=>match
Naive Decompression with Aho-Corasick pattern matching Define (ㄧ) Trf - the input, compressed traffic. (after Huffman decompression) (二) SWin1···32KB- the sliding window of LZ77.(stored pointer data) (三) SW inj -is the information about the uncompressed byte which is located j bytes before current byte. (四) FSM (state, byte) - AC FSM receives state and byte and returns the next state, where startStateFSM is the initial FSM state. (failure function & transition function) (五) Match (state) - if state is ”match state” it stores information about the matched pattern, otherwise NULL (output function)
Naive Decompression with Aho-Corasick pattern matching Aho-Corasickbased algorithm pointer area ( repeated string ) Not pointer area Decompress traffic
Compressed HTTP based Aho-Corasick • Improve method (ㄧ) Our key idea is to store data which is produced by scanning uncompressed traffic. (二) We will use the stored data either to find a possible match or to skip this traffic, if pattern matching algorithm encounters the scanned traffic.
Compressed HTTP based Aho-Corasick Define (ㄧ) Trf - the input, compressed traffic. (after Huffman decompression) (二) SWin1···32KB- the sliding window of LZ77.(stored pointer data) (三) SW inj -is the information about the uncompressed byte which is located j bytes before current byte. (四) FSM (state, byte) - AC FSM receives state and byte and returns the next state, where startStateFSM is the initial FSM state. (failure function & transition function) (五) Match (state) - if state is ”match state” it stores information about the matched pattern, otherwise NULL (output function)
Compressed HTTP based Aho-Corasick Define (六) Depth - the depth of a state s is defined as the number of edges in the shortest simple route between the start state to state s in the FSM. (七) CDepth - a constant parameter of improve algorithm (threshold) (八) SW inj - the information about the j thbyte is a record of two: SWinj.b– byte and SW inj .st–status
Compressed HTTP based Aho-Corasick CDepth threshold improve
pointer area ( repeated string ) Case 1 left boundary Not pointer area
Experiment SNORT => published rules on June 08 ModSecurity => open source web application firewall. Reduced set =>snort rules removed no effect on the textual HTML search CDepth equals 0 represents the naive algorithm
Conclusion • Our algorithm, achieves elimination of up to 75% of data scans based on information stored in the compressed data and gain up to 70% improvement in the performance of multi-patterns matching algorithm. • As far as we know we are the first paper, that analyzes the problem of ’on-the-fly’ multi-patterns matching algorithms on compressed HTTP traffic, and suggest a solution.