1 / 1

REU 2009-Traffic Analysis of IP Networks Daniel S. Allen, Mentor: Dr. Rahul Tripathi

REU 2009-Traffic Analysis of IP Networks Daniel S. Allen, Mentor: Dr. Rahul Tripathi. Data Streams Data streams are long sequences of data packets. Information travels over computer networks in the form of data streams. Data streams are often very large (millions of elements).

shay
Download Presentation

REU 2009-Traffic Analysis of IP Networks Daniel S. Allen, Mentor: Dr. Rahul Tripathi

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. REU 2009-Traffic Analysis of IP Networks Daniel S. Allen, Mentor: Dr. Rahul Tripathi • Data Streams • Data streams are long sequences of data packets. • Information travels over computer networks in the form of data streams. • Data streams are often very large (millions of elements). The Algorithm 1: Pre-processing stage 2: z := 32 log m/2, g := 2 log (1/δ) 3: choose z ∗ g locations in the stream at random 4: Online stage 5: for each item aj in the stream do 6: if aj already has one or more counters then 7: increment all of aj ’s counters 8: if j is one of the randomly chosen locations then 9: start keeping a count for aj , initialized at 1 10: Post-processing stage 11: // View the g ∗ z counts as a matrix c of size g × z 12: for i := 1 to g do 13: for j := 1 to z do 14: Xi,j := m∗ (ci,j log ci,j − (ci,j − 1) log (ci,j − 1)) 15: for i := 1 to g do 16: avg[i] := the average of the Xs in group i 17: return the median of avg[1], . . . , avg[g] • Streaming Algorithms • We wish to analyze elements of a data stream to discover anomalies, reveal patterns in traffic, etc. • By doing so, misuse of network resources can be detected. • Entropy can be used to obtain this information. Entropy is a measure of predictability in the value of each stream element. • We consider a data stream of length m with values in the range {1, 2, 3, …, n}. • The entropy H of a stream is defined as follows: • where mi is the frequency of the ith element. • When all stream elements are identical, H = 0; when all elements have the same frequency, H attains its maximum value of log (m). • Experiments • The algorithm was implemented using C++. • Several experiments were performed simulating a data stream, with the following specifications: n = 1000, and ε, δ = 0.25. • The “stream” elements take on values from 0 through 999. • Multiple sets of values representing different data flows were used: • Fig. 1: The counts for all values are reasonably close to a uniform distribution. This stream contains 25,000 elements. • Fig. 2: The approximated and actual entropies of streams of increasing length. These streams follow the same distribution as above. • The Algorithm • Since streams are typically large, an algorithm with minimal space requirement is ideal, i.e. sub-linear. • Lall et al. show that any strictly deterministic or randomized approximation algorithm must use at least m bits of space. • Therefore, a combined approach is needed. Rather than compute entropy value H for a stream, the algorithm computes S defined by • The algorithm described by Lall et al. uses an (ε, δ)-approximation. This returns an answer with a relative error of at most ε with probability (1 – δ). • The algorithm has three phases: • Pre-processing: A number of random locations in the stream are chosen. • Online: For each random location chosen, a new counter is created. Each active counter is updated. • Post-processing: Counts are arranged in a matrix. Estimated S values are calculated from the counts, then the mean of each row is taken, and the median of the means is returned as final estimated value. This guarantees a tight error bound on the estimated value of S. Fig 1. A close to uniform distribution • Conclusions • Entropy is useful in detecting unusual volumes or distributions of traffic flow. • The algorithm performs reasonably well for a close to uniform distribution of values. As the entropy of the stream decreases, the time required by the algorithm increases. • The algorithm also produces estimates which are closer to the entropy H as defined in the formula, for greater values of S, and for streams of greater length m. • References • S. Muthukrishnan. Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science, Vol. 1, No 2, pp. 117—236, 2005. • A. Lall, V. Sekar, M. Ogihara, J. Xu, and H. Zhang. Data streaming algorithms for estimating entropy of network traffic. In Proceedings of the ACM SIGMETRICS conference, pp. 145—156, 2006. Fig. 2. The performance of the algorithm Department of Computer Science & Engineering

More Related