100 likes | 220 Views
A Resource-minimalist Flow Size Histogram Estimator. Bruno Ribeiro, Don Towsley UMass Amherst. Tao Ye Sprint. Flow size histogram. Internet core router: TCP flows. Flow size e.g. # of packets TCP flow Flow size histogram used: Traffic profiling Anomaly detection
E N D
A Resource-minimalist Flow Size Histogram Estimator Bruno Ribeiro, Don Towsley UMass Amherst Tao Ye Sprint
Flow size histogram Internet core router: TCP flows • Flow size • e.g. # of packets TCP flow • Flow size histogram used: • Traffic profiling • Anomaly detection • Histogram hard to obtain • TCP flows: • Hundreds of millions flows/hour (OC-48 router) • Estimating flow size histograms • Random packet sampling is inaccurate [Ribeiro et al. 2006] • Flow sampling: more memory & accurate tail needs packet sampling • Current data streaming methods have slow estimators Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"
Outline • Related work • Our resource-minimalist approach • Experiment • Conclusions Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"
Related work [Kumar et al. 2004] Router Packet hash collision!! Universal hash function Flow size histogram 0 1 2 1 1 0 2 0 0 Estimation phase (powerful backend server) counters hash collisions Complexity: O( (maximum flow size)3 ) Sketch phase Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"
Resource-minimalist Approach • Insight: Don’t need to count every flow size • Idea: Group large flow sizes into bins • Fine grained flow histogram < k packets • Coarse grained flow histogram > k packets • Approach: Probablistic counting • Reduces counters to 6 bits • Requires: Low collision probability (e.g. counter/flow = 2/1) • Result: O(k3 + log(W)) estimator, e.g., k=16 and W=107 • Problem: Low collision→ more memory (2 counters / flow) • Approach: Counter folding • Negligible increase in estimator error • Requires one extra bit / counter • Result: Reduces number of counters by half Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"
Group large flow sizes & Probabilistic counting [Morris 78] Counter increments (probabilisitc): With ma = 2ª , 6 bit counter bins up to W=1014 Hash counter p=1/m2 p=1/m1 k+2 k-1 2 k k+1 0 1 Arrived packets: … … … m2 m1 k-1 k average • Counter value k→ flow sizes = [k, k+m1-1] • Counter value k+1 → flow sizes = [k+m1, k+m1+m2-1] Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"
Counter folding: Detecting some collisions • Maximum hash value = M • M/2 counters • If hash(packet) < M/2 → red • Otherwise (hash(packet) modM/2) →blue Detectable blue – red collision: 1 bit required Undetectable collision flow 7 flow 9 flow 8 Flows: Counters: 6 1 2 0 2 1 6 0 0 M/2 counters Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"
1 À Counter folding • Collision policy: • “red flow cannot increment blue counter” • “blue flow overwrites red counter” • counter = 0 are red Flows: Counters: 6 1 2 0 2 1 3 0 0 Counter colors: (extra bit) 1 1 0 0 1 0 1 0 0 • Result: e.g. if 1 counter / flow • All red counters are also bluecounters= 0 • Virtually expands hash table in ≈50% (virtual 2 counters/ flow) • Blue counters evict red counters • Flow sampling effect: Discards 15% flows at random Folding: interesting fact Number of foldings Policy: Evict newest flow (color = flow ID) Flow sampling Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"
Experiment Same accuracy without counter folding requires 13MB of memory • Evaluated with simulations • Our worst result with Internet core traces • 9.5 million flows • 8MB of memory • k=16 • W=1014 k Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"
Conclusions Insights • Group large flow sizes using probabilistic counters • Counter folding • Fast quasi-random sampling Our Estimator • Time complexity • Sketch phase • Universal hash cost • Two additions • One subtraction • Estimation phase • O(k3 + log(W)) • Space complexity • ≈ 1/4 memory usage of [Kumar et al. 2004] Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"