280 likes | 402 Views
Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices. Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College of Computing, Georgia Tech + AT&T Labs - Research. Flow matrix FM
E N D
Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang+ and Jun (Jim) Xu* *College of Computing, Georgia Tech +AT&T Labs - Research
Flow matrix FM FM [i, j, f] = the size of the flow f flowing from node i to node j Useful in Computing usage pattern of ISPs Detecting of flapping routes Detecting DDoS attacks Traffic matrix TM TM [i, j] = traffic volume from node i to node j Useful in Capacity planning and forecasting Routing configuration Network fault/reliability diagnoses Provisioning for SLA Traffic and flow matrices
Traffic matrix Indirect inference (holistic) Link counts from SNMP Routing matrix Network model Direct measurement Sampling Our approach Flow matrix Not well studied yet Straightforward approach: sampling Existing approaches
Data streaming algorithms • Data streaming:processing a long stream of data items in one pass using a small working memory in order to answer a class of queries regarding the stream. • Our context • Packet arrival rate is high (e.g., 10-40 Gbps) • Small but fast memory — SRAM (10ns per access) will be used. • Challenge: how to fully use SRAM to remember as much information pertinent to traffic/flow matrix as possible?
Two data streaming schemes • The bitmap-based scheme • Traffic matrix • The counter array-based scheme • Flow matrix • Traffic matrix
System model Sever Node i Data analysis module Online streaming module Online streaming module Node j
The bitmap-based scheme • Online streaming module • Data analysis module
Online streaming module • The data digest data-structure is a bit array (bitmap) initially set to all 0’s. • It is updated upon each packet arrival. • Measurement proceeds in epochs.
Example Invariant packet header + the first 8 bytes of the payload packet H(.) U := U-1 If U/b < Threshold save the bitmap start a new epoch 1 0 b-1 0 1 2 i [Snoeren et al. SIGCOMM’01] shows that these 28 bytes are sufficient to differentiate almost all non-identical packets.
Complexities • Computational complexity • One hash function computation • One write to the memory • Storage complexity • Each packet only produces a little more than one bit as its digest. • This can be further reduced using sampling.
The bitmap-based scheme • Online streaming module • Data analysis module
Data analysis module • What we have so far? (for TM [i, j]): • BMi generated by the traffic at node i (Ti) and • BMj generated by the traffic at node j (Tj) • What we want to estimate
Estimation based on BMi and BMj • [Whang et al. 1990] proposed a method to infer |T| from BM , i.e., where is the number of “0”s in BM. • |Ti U Tj| can be inferred from the bitwise-OR of BMi and BMj. • An estimator of TM [i, j] is given by • We derive the variance of the estimator
Multipaging 2 3 4 1 Node i Node j 1 2 3 t2 t1
Eliminating the effects of clock offset and packets in transit 2 3 4 1 Node i t Node j 1 2 3 T1 : a tight upper bound of clock offset (e.g., 50ms in a NTP enabled network)If t < T1, then overlap(1,2) = 1 Combining with packets in transit T2 : a tight upper bound of packet traversal time If t < T1+T2, then overlap(1,2) = 1
Counter array based scheme • Online streaming module • Data analysis module
Online streaming module • The data digest data-structure is a counter array. • It is updated upon each packet arrival. • Measurement proceeds in epochs.
Example Flow label packet H(.) n+1 n b-1 0 1 2 i
Counter array based scheme • Online streaming module • Data analysis module
Data analysis module • Principle: find good counter-value matching between ingress nodes and egress nodes • Challenge: the hashing collisions make the one-to-one matching fail. • Method: iterative elephant-first matching • Accuracy: work well for the medium-to-large flow matrix elements due to the Zipfian nature of Internet traffic.
Elephant-first matching Node i Node j Node i Node j a1>a2 a1 a2 a1-a2 0 FM[i, j, f] = a2 K a1<=a2 a1 a2 0 a2-a1 FM[i, j, f] = a1 K
Evaluation • Ideally it would require packet-level traces collected simultaneously at hundreds of ingress and egress routers in an ISP during a certain period of time. • We construct the synthetic experiments based on 16 publicly available packet-level traces from NLANR.
Evaluation: traffic matrix bitmap scheme counter array scheme
Conclusion • A novel data streaming algorithm that can produces traffic matrix estimation much more accurate than existing approaches. • Another data streaming algorithm that very accurately estimates flow matrix, a finer-grained characterization than traffic matrix. • Both algorithms are designed to operate at very high speed networks.
Thank You! • Questions?