270 likes | 359 Views
Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams. Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer Science Department, Northwestern University. Online Change Detection. Network anomalies are common
E N D
Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer Science Department, Northwestern University
Online Change Detection • Network anomalies are common • Flash crowds, failures, DoS, worms, … Online Detection over Data Streams • Data Stream: key/update pairs (k,u) • Heavy hitters (lots of prior work) • Heavy changes
k-ary sketch[Krishnamurthy, Sen, Zhang, Chen, 2003] … 0 1 K-1 1 … j … H • first to detect flow-level heavy changes in massive data streams at network traffic speeds.
… h1(k) 0 1 K-1 Estimate v(S, k): sum of updates for key k 1 … hj(k) j hH(k) … H k-ary sketch[Krishnamurthy, Sen, Zhang, Chen, 2003] Update (k, u): Tj [ hj(k)] += u (for all j)
? ?
? ? • Requires very little space: • E.g. 5 hash tables with 16 K buckets = 80 KB • Fits in high speed memory • Main problem • Cannot efficiently report keys with heavy change • Our Contribution • Determine set of keys that have “large” estimates in sketch
Reverse Sketch Problem “Heavy” 1 -Sketch -Threshold 2 Input: 3 4 5 Output: Set of keys that hash to heavy buckets in majority (or all) hash tables
value Modular hashing Streaming data recording k-ary sketch key IP mangling fast slow change threshold k-ary sketch Heavy change detection Reverse Hashing Algorithms heavy change keys Improve Heavy Change Detection Outline
Taking Intersections H = 5 K = 212 #keys = 232 (IP addresses) E[false positives] << 1 • Intersect A1, A2, A3, A4, A5
The problem with simple intersection • Why is this difficult ? • Each set Ai can be very large ! H = 5 K = 212 #keys = 232 (IP addresses) |A1| = 232 / 212 = 220
The problem with simple intersection • Why is this difficult ? • Each set Ai can be very large ! • Solution: Modular hashing
010 110 001 101 Modular hashingreduces the set size 32 bits 10010100 10101011 10010101 10100011 8 bits h() 12 bits
h1() h2() h3() h4() 010 110 001 101 010 110 001 101 Modular hashing reduces the set size 32 bits 10010100 10101011 10010101 10100011 8 bits Greatly reduces size of reverse mapped sets
h1() h2() h3() h4() 010 110 001 101 010 110 001 101 Modular hashing reduces the set size 32 bits 10010100 10101011 10010101 10100011 28/23 = 25 8 bits Greatly reduces size of reverse mapped sets
Modular hashing reduces the set size A1: 25 * 25 * 25 * 25 Intersection: Only 32 elements per partition 1 b1 2 b2 3 b3 4 b4 5 b5
Modular hashing reduces the set size A1: 25 * 25 * 25 * 25 A2: 25 * 25 * 25 * 25 Intersection: Only 32 elements per partition 1 b1 2 b2 3 b3 4 b4 5 b5
Handling Multiple Intersections… 2H different intersections 1 b1 b1 2 b2 b2 3 b3 b3 4 b4 b4 5 b5 b5 Much more difficult - Need sophisticated Reverse Hashing algorithms ( see tech report )
Problem: Too many collisions 32 bits 12 bits 129.105.56.23 129.105.56.28 129.105.56.109 129.105.56.35 129.105.56.98 ... 7 . 4 . 0 . *
Problem: Too many collisions 32 bits 12 bits 129.105.56.23 129.105.56.28 129.105.56.109 129.105.56.35 129.105.56.98 ... 7 . 4 . 0 . * Solution: IP Mangling
To be invertible: Must be relatively prime Invertible Modular Linear Equation f(x) a·x mod n • a is odd, chosen randomly
Modular Hashing Optimal Hashing
Modular Hashing Optimal Hashing Modular Hashing with IP Mangling
Recap: value stored value Streaming data recording reversible k-ary sketch IP mangling Modular hashing key change threshold Heavy change detection reversible k-ary sketch heavy change keys Reverse hashing Reverse IP mangling
Evaluation • Traffic traces from Northwestern University edge router • Each 5 min interval average traffic 7.5 GB in each interval • Compared with Ground Truth • 6 hash tables, 4K buckets each, totally 192KB memory • Up to 140 true heavy change keys in 1.5 seconds • Over 95% TPP • Less than 2% FPP • All missing changes are due to boundary effects
Conclusions/ Future Work • Sketches: efficient summary structures • Our contribution: Reversible Sketches • efficient online detection of keys with heavy changes Work in Progress (see tech report) • Improved reverse hashing • Statistical guarantee on detection accuracy • More advanced applications: • Hierarchical change detection • E.g. 129.105.100.* shows a big change !
Thank you ! See tech report for more! http://list.cs.northwestern.edu