1 / 27

Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams. Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer Science Department, Northwestern University. Online Change Detection. Network anomalies are common

kenyon-bray
Download Presentation

Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer Science Department, Northwestern University

  2. Online Change Detection • Network anomalies are common • Flash crowds, failures, DoS, worms, … Online Detection over Data Streams • Data Stream: key/update pairs (k,u) • Heavy hitters (lots of prior work) • Heavy changes

  3. k-ary sketch[Krishnamurthy, Sen, Zhang, Chen, 2003] … 0 1 K-1 1 … j … H • first to detect flow-level heavy changes in massive data streams at network traffic speeds.

  4. h1(k) 0 1 K-1 Estimate v(S, k): sum of updates for key k 1 … hj(k) j hH(k) … H k-ary sketch[Krishnamurthy, Sen, Zhang, Chen, 2003] Update (k, u): Tj [ hj(k)] += u (for all j)

  5. ? ?

  6. ? ? • Requires very little space: • E.g. 5 hash tables with 16 K buckets = 80 KB • Fits in high speed memory • Main problem • Cannot efficiently report keys with heavy change • Our Contribution • Determine set of keys that have “large” estimates in sketch

  7. Reverse Sketch Problem “Heavy” 1 -Sketch -Threshold 2 Input: 3 4 5 Output: Set of keys that hash to heavy buckets in majority (or all) hash tables

  8. value Modular hashing Streaming data recording k-ary sketch key IP mangling fast slow change threshold k-ary sketch Heavy change detection Reverse Hashing Algorithms heavy change keys Improve Heavy Change Detection Outline

  9. Taking Intersections H = 5 K = 212 #keys = 232 (IP addresses) E[false positives] << 1 • Intersect A1, A2, A3, A4, A5

  10. The problem with simple intersection • Why is this difficult ? • Each set Ai can be very large ! H = 5 K = 212 #keys = 232 (IP addresses) |A1| = 232 / 212 = 220

  11. The problem with simple intersection • Why is this difficult ? • Each set Ai can be very large ! • Solution: Modular hashing

  12. 010 110 001 101 Modular hashingreduces the set size 32 bits 10010100 10101011 10010101 10100011 8 bits h() 12 bits

  13. h1() h2() h3() h4() 010 110 001 101 010 110 001 101 Modular hashing reduces the set size 32 bits 10010100 10101011 10010101 10100011 8 bits Greatly reduces size of reverse mapped sets

  14. h1() h2() h3() h4() 010 110 001 101 010 110 001 101 Modular hashing reduces the set size 32 bits 10010100 10101011 10010101 10100011 28/23 = 25 8 bits Greatly reduces size of reverse mapped sets

  15. Modular hashing reduces the set size A1: 25 * 25 * 25 * 25 Intersection: Only 32 elements per partition 1 b1 2 b2 3 b3 4 b4 5 b5

  16. Modular hashing reduces the set size A1: 25 * 25 * 25 * 25 A2: 25 * 25 * 25 * 25 Intersection: Only 32 elements per partition 1 b1 2 b2 3 b3 4 b4 5 b5

  17. Handling Multiple Intersections… 2H different intersections 1 b1 b1 2 b2 b2 3 b3 b3 4 b4 b4 5 b5 b5 Much more difficult - Need sophisticated Reverse Hashing algorithms ( see tech report )

  18. Problem: Too many collisions 32 bits 12 bits 129.105.56.23 129.105.56.28 129.105.56.109 129.105.56.35 129.105.56.98 ... 7 . 4 . 0 . *

  19. Problem: Too many collisions 32 bits 12 bits 129.105.56.23 129.105.56.28 129.105.56.109 129.105.56.35 129.105.56.98 ... 7 . 4 . 0 . * Solution: IP Mangling

  20. IP-mangling

  21. To be invertible: Must be relatively prime Invertible Modular Linear Equation f(x)  a·x mod n • a is odd, chosen randomly

  22. Modular Hashing Optimal Hashing

  23. Modular Hashing Optimal Hashing Modular Hashing with IP Mangling

  24. Recap: value stored value Streaming data recording reversible k-ary sketch IP mangling Modular hashing key change threshold Heavy change detection reversible k-ary sketch heavy change keys Reverse hashing Reverse IP mangling

  25. Evaluation • Traffic traces from Northwestern University edge router • Each 5 min interval  average traffic 7.5 GB in each interval • Compared with Ground Truth • 6 hash tables, 4K buckets each, totally 192KB memory • Up to 140 true heavy change keys in 1.5 seconds • Over 95% TPP • Less than 2% FPP • All missing changes are due to boundary effects

  26. Conclusions/ Future Work • Sketches: efficient summary structures • Our contribution: Reversible Sketches • efficient online detection of keys with heavy changes Work in Progress (see tech report) • Improved reverse hashing • Statistical guarantee on detection accuracy • More advanced applications: • Hierarchical change detection • E.g. 129.105.100.* shows a big change !

  27. Thank you ! See tech report for more! http://list.cs.northwestern.edu

More Related