1 / 70

Internet Measurement, Section 6.3 Analyzing Network Traffics

Internet Measurement, Section 6.3 Analyzing Network Traffics. Mohammad Hassan Hajiesmaili ECE Department, University of Tehran Fall 2009. Outline. Packet Capture Data Management Data Reduction Inference. Packet Capture. Passive Traffic Management

hisa
Download Presentation

Internet Measurement, Section 6.3 Analyzing Network Traffics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Internet Measurement, Section 6.3Analyzing Network Traffics Mohammad Hassan Hajiesmaili ECE Department, University of Tehran Fall 2009

  2. Outline • Packet Capture • Data Management • Data Reduction • Inference

  3. Packet Capture • Passive Traffic Management • Packet Capture in General Purpose Systems • Packet Capture in Special Purpose Systems • Control Plane Traffic

  4. Packet Capture in General Purpose Systems • Libpcap • Promiscuous mode • Report all packet received • Packet Filter • Specify how much of each packet should be captured • Parsing tools • Tcpdump • Ethereal • Commercial products

  5. Packet Capture in General Purpose Systems • Broadcast LANs • Switched LANs • Port Mirroring • Libpcap • Unrestricted Access • Pktd & scriptroute • Set Access policies by system Admin

  6. Packet Capture in Special Purpose Systems • Monitoring Core Network Links • In Optical Scenarios the link speed is faster than PCI bus • Specialized Network Interface and Interface Driver • Some Example • OC3MON • OC12MON

  7. Control Plane Traffic • Capture control packet traffic • Example • Local view of BGP system • Establishing a session with a BGP-speaking router • Tools • GNU Zebra • Quagga • Can capture other routing traffic • OSPF • RIP

  8. Data Management • Full Packet Capture is challenging • Limited bus bandwidth • Limited memory access speed • Limited speed and capacities of disk array • Limited processing power • Specialized tools • Using sophisticated algorithms for operating on large stream of data • Example: • Smacq • Windmill

  9. Data Management • Database management problem • Very large data sets • Incrementally over time • Continuous queries • Solution • Data Stream Management • Example: • Tribeca • STREAMS • TelegraphCQ • Gigascope • Queries are expresses in GSQL

  10. Looking at the traffic Too much data for a human Do something smarter!

  11. Data Reduction • Traffic Counters • Flow Capture • Sampling • Summarization • Dimensionality Reduction • Probabilistic Model

  12. Counter • Use aggregation to form time series of counts of traffic statistics • Bytes or packets per unit time • Time series are constructed by periodic polling • Generically called SNMP counts • Benefits: • Capturing without much performance impact on router • Extremely compact compared to traffic traces

  13. Counters - Drawback • SNMP transport is via UDP • Measurement packets can be lost • Difficulty in obtaining synchronized time series across multiple interfaces • Too coarse-grained for many needs

  14. Flow Capture • Counters provide basic information • Almost all traffic semantics are absent • Capture and store packet trains or flows

  15. Packet Train • A burst of packets arriving from the same source and heading to the same destination. • If the spacing between two packets exceeds some inter-train gap, they are said to belong to different trains.

  16. Capture Packet Train • Can be used for • Monitoring basic network activity • Monitoring users and applications • Network planning • Security analysis • Accounting and Billing • Tools for capturing are present in all major routers

  17. Packet Train Record Content • IP header (5-tuple) • Source IP address, Source TCP port, Destination IP address, Destination port, Protocol ID • Start Time • End Time • Number of Packet • Number of bytes contained in the packet train • Dramatic decrease in trace size compare to full packet capture

  18. Packet Flows • Capturing packet flows rather than packet trains • Require higher level software for processing and interpreting the raw data

  19. Flow Capture • IETF standards for flow capture • IP Flow Information Export effort (IPFIX) • For Exporters: Providers of flow data • For Collectors: Consumers of flow data • Real Time Flow Metering (RTFM) • Meter MIB that can be accessed via SNMP • Meter Readers: Collect flow data • Managers: Coordinate meters and meter readers

  20. Sampling • In sampling scheme, a subset of packets are chosen for capture • Two important question • How should packet be chosen for sampling? • How should one correct or compensate for the sampling process when performing analysis? • Basic packet sampling • Trajectory sampling

  21. Basic Packet Sampling • The sampling process is performed independently on each link being monitored • Two category: • Variable rate sampling • Constant rate sampling • Random sampling • Deterministic sampling • Stratified sampling

  22. Trajectory Sampling • Basic Packet Sampling • Packet capture at multiple points in a network • Can not obtain per-packet delay • Trajectory Sampling Idea • If a packet is chosen for sampling at any point in the network, it is chosen at all point in the network. • Idea of implementation • Calculation of hash function for each packet

  23. Trajectory Sampling - Advantages • Easy to obtain metrics on customer performance • Per-customer packet delay • Detect routing loops • Trace denial of service attacks

  24. Summarization • Form compact summaries of large volume of data • Bloom Filter • Sketches • Other Approaches

  25. Review: Bloom Filters • Given a set S = {x1,x2,x3,…xn} on a universe U, want to answer queries of the form: • Bloom filter provides an answer in • “Constant” time (time to hash). • Small amount of space. • But with some probability of being wrong. • Alternative to hashing with interesting tradeoffs.

  26. B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 B 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0 B 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0 B 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0 Bloom Filters Start with an m bit array, filled with 0s. Hash each item xjin S k times. If Hi(xj) = a, set B[a] = 1. To check if y is in S, check B at Hi(y). All k values must be 1. Possible to have a false positive; all k values are 1, but y is not in S. n items m= cn bits k hash functions

  27. False Positive Probability • Pr( specific bit of filter is 0) is • If r is fraction of 0 bits in the filter then false positive probability is • Find optimal at k = (ln 2)m/n by calculus. • So optimal FPP is about (0.6185)m/n n items m= cn bits k hash functions

  28. Example m/n = 8 Opt k = 8 ln 2 = 5.45... n items m= cn bits k hash functions

  29. Handling Deletions • Bloom filters can handle insertions, but not deletions. • If deleting xi means resetting 1s to 0s, then deleting xi will “delete” xj. xixj B 0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0

  30. B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 B B B 0 0 0 3 2 1 0 0 0 0 0 0 0 1 0 0 0 0 1 2 2 0 0 0 0 0 0 3 1 3 2 2 1 1 1 1 0 0 0 2 1 1 1 1 1 0 0 0 Counting Bloom Filters Start with an m bit array, filled with 0s. Hash each item xjin S k times. If Hi(xj) = a, add 1 to B[a]. To delete xjdecrement the corresponding counters. Can obtain a corresponding Bloom filter by reducing to 0/1.

  31. Counting Bloom Filters: Overflow • Must choose counters large enough to avoid overflow. • Poisson approximation suggests 4 bits/counter. • Average load using k = (ln 2)m/n counters is ln 2. • Probability a counter has load at least 16:

  32. Bloom Filters in Networking • Summarizing the contents of web caches to facilitate sharing cache content • Peer-to-Peer Systems • Routing • Active Queue Management • Topology Discovery • An Extension • Multistage bloom filters for storage of multisets

  33. Sketches • X: histogram of flow counts observed • Xi: number of packets observed for flow i • Random lower-dimension projection • For m << n • m*n Matrix P • P has a single 1 in a randomly chosen row • All other entries in the column are 0 • Forming the product Px is equivalent to constructing a single counter set of the multistage filter

  34. Sketches • Dimension reducing random projection • Linear • Well suited to processing via wavelets • Applications • Traffic compression • Traffic similarity detection • Heavy-hitter estimation • Drawback • The set of keys encoded cannot easily be retrieved directly form data structure

  35. Summarization approaches • Trie data structure • Constructed based on address prefix • Each node in the tire stores the traffic volume corresponding to all addresses contained in the prefix • Approaches used to count the number of distinct values in a traffic trace • Probabilistic counting • Bitmap algorithms

  36. Summarization approaches • Approaches that maintain traffic summary for a while • Landmark window model • Sliding window model

  37. Dimensionality Reduction • Approaches for solving the problem of high dimensionality in traffic measurements. • Dimension reduction approaches • Tend to find an alternate representation of data that exposes the true (low-dimensional) structure in the data • Clustering • Principal Component Analysis

  38. Who is using my link?

  39. Src. IP Dest. IP Dest. IP Source port Protocol Src. port Dest. port Src. net Dest. net Dest. net Looking at traffic aggregates • Aggregating on individual packet header fields gives useful results but • Traffic reports are not always at the right granularity (e.g. individual IP address, subnet, etc.) • Cannot show aggregates defined over multiple fields (e.g. which network uses which application) • The traffic analysis tool should automatically find aggregates over the right fields at the right granularity Which network uses web and which one kazaa? Where does the traffic come from? …… What apps are used? Most traffic goes to the dorms …

  40. Ideal traffic report Web is the dominant application This is a Denial of Service attack !! The library is a heavy user of web That’s a big flash crowd! This paper is about giving the network administratorinsightfultraffic reports

  41. Clustering • Similarity metrics • Defined on the set of traffic features • Specific form of Vector Quantization • Challenges • Discovering a set of cluster definitions that succinctly describe the traffic • Search problem in high dimensional space

  42. Principal Component Analysis • Clustering • Nonlinear dimensionality reduction • PCA • Linear • Optimal, in the sense of capturing maximum variability in the data using a minimum number of dimension

  43. Principal Component Analysis (PCA) • Away to • identifying “patterns” in data • Expressing the data in order to highlight the correlations such as similarities and dissimilarities • Why important? • Hard to visualize the patterns of high dimensional data • How to take advantages of PCA • Compressing Data by reducing the number of dimensions without “hopefully” much losing of data information

  44. Background Mathematics • Linear Algebra • Matrix representation of Data • Statistics Concepts • Mean – Expectation of the data distribution • Covariance – Sparseness of data distribution • Build Covariance Matrix (CM) • Covariance Matrix tells us the correlations of data between dimensions of data • CMij = Positive -> ith dimension increased, so does jth dimension • CMij = Negative -> ith dimension increased, jth dimension decreased • CMij = 0 -> No correlation, which means Independency

  45. PCA (1) • For any given dataset, PCA finds a new coordinate system that maps maximum variability in the data to a minimum number of coordinates • New axes are called Principal Axes or Components

  46. Principal Component Analysis • Suppose that the original variables X1, X2, . . . , Xm form a coordinate system in m-dimensional space. • Each variable Xi represent an n × 1 vector, where n is the number of records. • Standardized variable Zi is the n × 1 vector, where Zi = (Xi − µi )/σii , µi is the mean of Xi , and σii is the standard deviation of Xi • In matrix notation: Z = (V1/2)−1(X − µ), and V1/2 is a diagonal matrix (nonzero entries only on the diagonal)

More Related