440 likes | 724 Views
Counter Braids A novel counter architecture for network measurement. Data Measurement: Background. Accurate data measurement is needed in network and database systems. Internet Backbones Accounting/Billing by ISPs: Usage-based pricing Traffic engineering
E N D
Counter BraidsA novel counter architecture for network measurement
Data Measurement: Background Accurate data measurement is needed in network and database systems. • Internet Backbones • Accounting/Billing by ISPs: Usage-based pricing • Traffic engineering • Network diagnostics and forensics: Intrusion detection, denial-of-service attacks • Products: NetFlow (Cisco), cflowd (Juniper), NetStream (Huawei) • Ref. Lin, Xu, Kumar, Sung, Ramabhadran, Varghese, Estan, Varghese, Shah, Iyer, Prabhakar, McKeown, etc. • Database Systems • Sketches, synopses of data streams • Usage and access logs • Ref. Cormode, Muthukrishnan, Babcock, Babu, Motwani, Widom, Charikar, Chen, Farach-Colton, Alon, etc • Data Centers, Cloud Computing • Monitoring network usage, diagnostics, real-time load balancing, network planning • E.g. Amazon, Facebook, Google, Microsoft, Yahoo!
The Problem • A flow is a sequence of packets with common properties; e.g. the same source-destination address, common IP header 5-tuple, common prefix in routing table • An email, a web download, a video or voice stream, ... • Problem: Measure the number of bytes and packets sent by each flow in a measurement interval : 0 : 1 : 2 : 3 : 9 : 8 : 7 : 6 : 2 : 4 : 12 : 10 : 0 : 5 : 11 : 1 : 24 : 19 : 15 : 16 : 17 : 18 : 14 : 13 : 21 : 22 : 23 : 20 : 3 : 2 : 0 : 1 : 0 : 1 : 1 : 2 : 3 : 0 Packet Counter
The Constraints • Large number of active flows in a measurement interval • Several millions over 5-min interval on backbone links (CAIDA) • Problem: need lots of counters, large memory • Very high link speeds, need fast memory updates • E.g. per-packet update time on 40 Gbps links are roughly 15 nanoseconds • Problem: need memory with fast access times • Memories are either large (DRAM) or fast (SRAM); can’t get both • DRAMs have access times of 50 - 60 ns • SRAMs have access times of 4.5 -7 ns, but around 50 - 60 Mb (Micron Tech.)
Building Counter Arrays: A naïve approach • Allocate one counter per flow • Packet arrives, looks up flow-id in memory, update corresponding counter Flow 1: Flow 2: : 1 : 2 Flow 3: : 23 : 24 : 3 Flow 4: -- 0 -- 1 : 1 Flow 5: : 3 • Problems • Too much space taken up by counters, especially because not all flows are large • Need to perform flow-to-counter association for every packet • I.e. need perfect hash function
Previous Approaches 1. Hybrid SRAM-DRAM architecture (Shah et. al ’01, Ramabhadran and Varghese ’03, Zhao et. al. ’06) • Pros: Obtains exact counts, reduces cost of total memory for counters. • Cons: • Need to know the flow-to-counter association • High-complexity counter management : 2 Counter management : 24 : 3 : 1 : 3 slow memory (DRAM) fast memory (SRAM)
Previous Approaches 2. Approximate Counting(Estan and Varghese, ’02) Approximate large flow detection • Measures data from large flows (elephants) and ignore small flows (mice) • Exploits Pareto distribution in network traffic (Fang, Peterson ’99, Feldmann et.al ’00) : 2 : 24 : 3 : 1 : 3 fast memory (SRAM) • Pros: • Strikes a neat trade-off: accuracy for simplicity. • Flow-to-counter association becomes manageable. • Cons: Flow counts are approximate.
Summary • Exact counting methods • Hybrid architecture, complex • Approximate methods • Focus on large flows, inaccurate • Problems to address • Get rid of flow-to-counter association problem • Save counter space (reduce memory size)
Getting Rid of Flow-to-Counter Association • Use multiple hash functions • Single hash function leads to collisions. • However, one can use two or more hash functions and use the redundancy to recover the flow size. 25 0 M = 0 1 1 0 0 1 1 0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 1 1 2 2 26 26 24 24 6 1 3 3 6 3 1 1 3 3 3 3 • Need efficient decoding algorithm for solving c = Mf • c: counts, M: adjacency matrix, f: flow sizes
Saving Counter Space • “Braid” counters • Share the more significant bits : 2 : 2 : 24 : 24 : 3 : 3 : 1 : 1 : 3 : 3
Updating Counter Braids : 2 : 24 : 3 : 1 : 2 : 3
The Overall Architecture Status bit Indicates overflow Elephant Traps Few, deep counters Mouse Traps Many, shallow counters flows
System Diagram Online Count using CBs Counter values in CB structure list of <flow ID> Decoder Offline list of <flow label, size> 2 24 Counter values in CB structure 3 1 3
Will Describe • Decoding algorithms • Sizing of Counter Braids • Comparison with Linear Programming Decoder
Decoding Algorithms • Typical set decoder • Message-passing decoder
Typical set decoder • Assumptions on flow size distribution, P: • At most power-law tails: • Decreasing digit entropy: write , entropy of fi(l) decreasing in l • Satisfied by real traffic distributions • Typical set decoder Let f’ = (f1’, ... fn’) and . If f’ is the unique vector that solves c = Mf and D(P’ | P) < , then output f’; otherwise, output error. Theorem (Lu, Montanari, Prabhakar ’07) Asymptotic Optimality For any rate r > H(P) there exists a sequence of reliable sparse Counter Braids with asymptotic rate r. Hence, with the typical set decoder, Counter Braids is optimal. • However, too complex to implement
Message Passing Decoder (Lu et.al. ’08) 2nd layer flows (exist only conceptually) • Linear-complexity decoding algorithm • In each layer, the decoder solves the set of linear equations, c = Mf carry-over shallow counters
A Practical Algorithm: Count-Min • Count-Min (Cormode and Muthukrishnan ‘05) • Algorithm: Estimate each flow’s size as the minimum counter it hits. • The flow sizes for the example below would be estimated as: 34, 34, 32 1 34 34 1 32 32 • Pros • Fast and easy to implement • Cons • Decision based on local information. Need lots of counters for accurate estimation • Don’t know the error magnitude; in fact, don’t know if there is an error
Count-Min as Message Passing 34 0 34 0 32 0 Iter 1 Iter 0 Count-Min n3 1 34 c 2 34 1 Upper bounds on true flow sizes 1 32 32
Message Passing continues 34 0 34 0 32 0 Iter 1 Iter 0 Count-Min 1 3 34 34 1 n2 32 32 n1
Message Passing continues 34 1 1 1 0 34 1 1 1 0 32 1 32 32 0 Iter 1 Iter 2 Iter 3 Iter 4 Iter 0 Count-Min n3 1 34 c 2 34 1 Lower bounds on true flow sizes 1 32 32
Message Passing Decoder Initialize ia = 0 8 i and 8 a Iterations for t = 1 to T ia (t) = max { ca - j ija(t-1) , 1} ia(t) = minb abi(t) if t is odd = maxb abi(t) if t is even Final estimate fi(T) = minaai(T) if T is odd = maxaai(T) if T is even
Anti-monotonicity Property Flow size Flow index Lemma. Anti-monotonicity Property (Lu et. al. ’08) If and ’ are such that for every i and a, ia(t-1) ·ia’(t-1)· fi, then ia(t) ¸ia’(t) ¸ fi. Consequently, since fe(0) = 0, fe(2t) · f component-wise and fe(2t) is component-wise non-decreasing. Similarly fe(2t+1) ¸ f and is component-wise non-increasing. • Corrollary: Incorrect flow size estimates can be identified when the upper bounds differ from the lower bounds. • When does the gap close?
Convergence Definitions: • n flows and m counters • k hash functions • Error measure • Number of counters per flow where m is the number of counters and n is the number of flows
Threshold, *= 0.72 (No. of flows=10,000) Count-Min’s error Perr Iteration number • = 0.71, 21% flows not decoded, but we know their errors exactly. • = 0.725, converges in 25 iterations • = 0.74, converges faster
Threshold Let where = nk/m is the average degree of a counter node. Let = P (fi > 1) and Let *= sup { 2R: x = g(x) has no solution 8 x 2 (0,1] }. Theorem. The Threshold (Lu et. al. ’08) * = k / * such that in the large n limit (i) If > *, Perr → 0; the lower bounds and upper bounds converge to the true flow sizes. (ii) If ≤ *, Perr > 0; there exists a positive proportion of flows such that the lower bounds differ from the upper bounds, i.e, some flows are incorrectly decoded.
Density Evolution Lemma (Modern Coding Theory, Richardson and Urbanke) Locally Tree-like Property Consider running the algorithm for a finite number of iterations. The error probability averaged over the class of random graphs, as the system size grows, converges to the error probability on a finite tree, where messages are independent of each other. • Flow nodes have degree k • Counter nodes have random degrees of Poisson distribution with mean nk/m 28
Density Evolution Error probability of a counter-to-flow message where j is the degree of the counter node, which has a Poisson distribution with mean nk/m 29
Density Evolution Error probability of a counter-to-flow message, averaged over j ~ Poisson() where is the moment generating function of the Poisson distribution. 30
Density Evolution • Error probability of a counter-to-flow message • Error probability of a flow-to-counter message if t is odd if t is even where k is the number of hash functions and = P (f>1)
Density Evolution Error probability of a counter-to-flow message Error probability of a flow-to-counter message if t is odd if t is even where k is the number of hash functions and = P (f>1) Substitute yt and repeat for odd and even iterations, we obtain Now look for the fixed point of g(x) = x 32
Computation of Threshold ß > ß* • Above the threshold, decoding error goes to 0
Computation of Threshold Above the threshold, decoding error goes to 0 Below the threshold, decoding error stays positive ß < ß* ß > ß* 34
Computation of Threshold Above the threshold, decoding error goes to 0 Below the threshold, decoding error stays positive Threshold is computed when g(x) is tangential to the diagonal ß = ß* ß > ß* ß < ß* 35
Computation of Threshold Above the threshold, decoding error goes to 0 Below the threshold, decoding error stays positive Threshold is computed when g(x) is tangential to the diagonal ß > ß* ß < ß* ß = ß* Fraction of flows incorrectly decoded Iteration number 36
Relation to Compressed Sensing • Compressed sensing • Storing sparse vectors using random linear transformations • Baraniuk, Candes, Donoho, Gilbert, Hassibi, Indyk, Milenkovic, Muthukrishnan, Romberg, Strauss, Tanner, Tao, Tropp, Vershynin, Wainwright, et al • Problem statement minimize ||f||1 subject to c= Mf • LP decoding: worst-case cubic complexity • We consider sparse non-negative solutions and compare the Linear Programming (LP) decoder and the Message Passing (MP) decoder • Given an -sparse vector f (with proportion of non-zero entries) • let m = n*() be the number of equations needed for each algorithm to recover f as n → 1, with high probability over the space of random matrices and the space of vector f
Comparison (Lu, Montonari, Prabhakar ‘08) Theorem. LP Threshold. (Donoho, Tanner ‘05) Consider a mxn matrix M with columns drawn independently from a multivariate normal distribution on Rd with nonsingular covariance matrix. For 2 [0,1], numerically obtainable and plotted in the graph. As → 0, Theorem. MP Threshold. (Lu, Montanari, Prabhakar ‘08) Consider sparse matrices with given degree distributions. For any 2[0,1], * · 1/2. For · 1/8, --LP our algorithm
Implementation • Flow label collection (Lu, Prabhakar, ’09) • Flow labels are collected and stored in DRAM, with a small fraction (e.g. 0.5%) missing • The resulting problem is equivalent to decoding with noise FPGA Implementation (Lu, Luo, Prabhakar, submitted) • Simple to implement • 2512 Slice Flip Flops and 5342 4-input LUTs on NetFPGA, with 2.8MB memory • 65-nm ASIC synthesis: 0.0487 mm2 for logic and 15.66 mm2 for memory • Parallel architecture to achieve high throughput: 84 Gbps at 125 MHz clock
Trace Simulation CAIDA trace (2 hours data) simulation • Measurement interval: 5 min • Number of flows: ~0.9 million † Includes Bloom filters for flow label collection and status bits for first-layer counters *Hash table and hybrid assume 1.5 times over-provisioning using d-left hash tables # Multi-stage filters assume 4000x4 x 3 bytes and 3000 x 32 bytes in flow memory. 100,000 flows, largest ~3000 Decoding complexity: • 1 million flows take 15 seconds to decode on a 2.6 GHz machine.