1 / 17

Accelerating Multi-Patterns Matching on Compressed HTTP Traffic

Accelerating Multi-Patterns Matching on Compressed HTTP Traffic. Authors: Anat Bremler-Barr , Yaron Koral Presenter: Chia-Ming ,Chang Date: 2009.9.1 Publisher/Conf. : IEEE INFOCOM 2009 April 2009 Page(s):397 - 405 .

deron
Download Presentation

Accelerating Multi-Patterns Matching on Compressed HTTP Traffic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accelerating Multi-Patterns Matching on Compressed HTTP Traffic Authors: Anat Bremler-Barr, Yaron Koral Presenter: Chia-Ming ,Chang Date: 2009.9.1 Publisher/Conf. : IEEE INFOCOM 2009 April 2009 Page(s):397 - 405  Dept. of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

  2. Outline • Introduction • Background • Naive Decompression with Aho-CorasickV.S Compressed HTTP based Aho-Corasick • Experiment • Conclusions

  3. Introduction • HTTP compression, also known as content encoding, is a publicly defined way to compress textual content transferred from web servers to browsers. • This standards-based method of delivering compressed content is built into HTTP 1.1, and most modern browsers that support HTTP 1.1 support GZIP compression .

  4. Introduction GZIP compression ( encode) (一) LZ77 Compression LZ77 compression technique is that we can compress a series of bytes (characters) if we spot that this series of bytes has already appeared in the past. For example: the text: ”abcdefabcd”, will be compressed to: ”abcdef(6,4)”, i.e., return 6 bytes and copy 4 bytes from that point. (二) Huffman Coding In HTTP, Huffman encodes uncompressed bytes and pointers (i.e., as numbers).

  5. Introduction GZIP compression ( decode) • 1) Remove the HTTP header and store the Huffman dictionary of the specific session in memory. Note that different HTTP sessions would have different Huffman dictionaries. • 2) Decode the Huffman mapping of each symbol to the original byte or pointer representation using the specific Huffman dictionary table. • 3) Decode the LZ77 part. • 4) Perform multi-patterns matching on the uncompressed traffic.

  6. Background LZ77 Compression we encode this series of bytes (denoted by repeated string) by the pair (distance,length) (1) distanceis a number between 1-32768 (32KB) indicates the distance in bytes of the repeated string (2) lengthis a number between 3-258 indicates the length of the string in bytes. U – uncompressed traffic

  7. Naive Decompression with Aho-Corasick pattern matching Pattern set ={ abcd , nba, nbc} distance length u =>unmatch m=>match

  8. Naive Decompression with Aho-Corasick pattern matching Define (ㄧ) Trf - the input, compressed traffic. (after Huffman decompression) (二) SWin1···32KB- the sliding window of LZ77.(stored pointer data) (三) SW inj -is the information about the uncompressed byte which is located j bytes before current byte. (四) FSM (state, byte) - AC FSM receives state and byte and returns the next state, where startStateFSM is the initial FSM state. (failure function & transition function) (五) Match (state) - if state is ”match state” it stores information about the matched pattern, otherwise NULL (output function)

  9. Naive Decompression with Aho-Corasick pattern matching Aho-Corasickbased algorithm pointer area ( repeated string ) Not pointer area Decompress traffic

  10. Compressed HTTP based Aho-Corasick • Improve method (ㄧ) Our key idea is to store data which is produced by scanning uncompressed traffic. (二) We will use the stored data either to find a possible match or to skip this traffic, if pattern matching algorithm encounters the scanned traffic.

  11. Compressed HTTP based Aho-Corasick Define (ㄧ) Trf - the input, compressed traffic. (after Huffman decompression) (二) SWin1···32KB- the sliding window of LZ77.(stored pointer data) (三) SW inj -is the information about the uncompressed byte which is located j bytes before current byte. (四) FSM (state, byte) - AC FSM receives state and byte and returns the next state, where startStateFSM is the initial FSM state. (failure function & transition function) (五) Match (state) - if state is ”match state” it stores information about the matched pattern, otherwise NULL (output function)

  12. Compressed HTTP based Aho-Corasick Define (六) Depth - the depth of a state s is defined as the number of edges in the shortest simple route between the start state to state s in the FSM. (七) CDepth - a constant parameter of improve algorithm (threshold) (八) SW inj - the information about the j thbyte is a record of two: SWinj.b– byte and SW inj .st–status

  13. Compressed HTTP based Aho-Corasick CDepth threshold improve

  14. pointer area ( repeated string ) Case 1 left boundary Not pointer area

  15. Experiment

  16. Experiment SNORT => published rules on June 08 ModSecurity => open source web application firewall. Reduced set =>snort rules removed no effect on the textual HTML search CDepth equals 0 represents the naive algorithm

  17. Conclusion • Our algorithm, achieves elimination of up to 75% of data scans based on information stored in the compressed data and gain up to 70% improvement in the performance of multi-patterns matching algorithm. • As far as we know we are the first paper, that analyzes the problem of ’on-the-fly’ multi-patterns matching algorithms on compressed HTTP traffic, and suggest a solution.

More Related