1 / 27

Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP

Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP. Author: Anat Bremler-Barr, Yaron Koral , Shimrit Tzur David, David Hay Publisher: IEEE INFOCOM , 2012 Presenter: Kai-Yang, Liu Date: 2011/12/21. INTRODUCTION.

Download Presentation

Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Author: Anat Bremler-Barr, Yaron Koral , Shimrit Tzur David, David Hay Publisher: IEEE INFOCOM , 2012 Presenter: Kai-Yang, Liu Date: 2011/12/21

  2. INTRODUCTION • Gzip or Deflate work well as the compression for each individual response, but in many cases there is a lot of common data shared by a group of pages. • Therefore, compression methods of the next generation are inter-file, where there is one dictionary that can be referenced by several files.

  3. The VCDIFF Compression Algorithm • VCDIFF encoding process uses three types of instructions, called delta instructions: • ADD(i , str) means to append to the output i bytes, which are specified in parameter str. • RUN(i , b) means to append i times the byte b. • COPY(p , x) means that the interval [p , p + x) should be copied from the dictionary.

  4. Example Dictionary : DBEAACDBCABC The plain-text that should be considered is therefore ABDDBEAAAACDBCABABCAACBCDBADBC

  5. Aho-Corasick Algorithm • patterns set: { E,BE,BD,BCD,BCAA,CDBCAB }

  6. The Offline Phase • The dictionary is scanned from the first symbol using the Aho-Corasick algorithm. State array : Match list :

  7. The Offline Phase Dictionary : DBEAACDBCABC

  8. Four Kinds of Matches • Patterns that are fully contained within an ADD or COPY instruction. • Patterns whose prefix is within a COPY instruction. • Patterns whose suffix is within a COPY instruction.

  9. The Online Phase • Scanning the delta file by the AC algorithm. • ADD instruction : simply scanning it by traversing the automaton. • COPY (p , x) instruction : Step1: Scan the copied symbols from the dictionary one by one, until when scanning a symbol bp+i we reach a state in the automaton whose depth is less or equal to i. ※ Find all the patterns of fourth category

  10. The Online Phase Step2: We check the Matched list to find any patterns in the dictionary that ends within interval [ p, p+x). ※ Find all the patterns ofsecond category Step3: We obtain the state State[p+x-1]. From that state, we follow failure transitions in the automaton, until we reach a state s whose depth is less or equal to x.

  11. Dictionary : DBEAACDBCABC Match List:

  12. Dictionary : DBEAACDBCABC Match List:

  13. Dictionary : DBEAACDBCABC Match List:

  14. Dictionary : DBEAACDBCABC Match List:

  15. Dictionary : DBEAACDBCABC Match List:

  16. Dictionary : DBEAACDBCABC Match List:

  17. Dictionary : DBEAACDBCABC Match List:

  18. Dictionary : DBEAACDBCABC Match List:

  19. Dictionary : DBEAACDBCABC Match List:

  20. Dictionary : DBEAACDBCABC Match List:

  21. Optimizations • Efficient pattern lookups in the Matched list: • Save the Matched list as a balanced tree. • Add an array of pointers of the dictionary size. • Given a COPY (p , x)instruction, one can cache the corresponding internal matches within [p , p+x-1] in a hash-table whose key is “(p , x)”.

  22. REGULAR EXPRESSIONS INSPECTION • Anchors are extracted from the regular expression offline. Then, our algorithm is applied on the SDCH-compressed traffic with the anchors as the patterns set. • For example the regular expression \d{6}ABCDE\s+123456\d*XYZ$ if we matched the anchor ABCDE at position x1 and the anchor XYZ at position x2, the interval [x1-10,x2] should be passed to the regular expressions engine for re-examination.

  23. Experimental Results • Data Sets : We first downloaded the dictionary from google.com and used the 1000 most popular Google search queries. • Pattern Sets : The signatures data sets are drawn from a snapshot of Snort rules as of October 2010. • we also constructed for each input file a synthetic patterns file.

  24. Experimental Results

  25. Experimental Results

  26. Experimental Results

  27. Experimental Results

More Related