1 / 39

Internet Measurement

Internet Measurement. CS 7260 Nick Feamster February 13, 2006. Administrivia. Project meetings: today and tomorrow (next Monday as needed) New problem set out Wednesday 3-4 (shorter) problems, 9 days. Today’s Lecture. Why measure? Troubleshooting and debugging

eadoin
Download Presentation

Internet Measurement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Internet Measurement CS 7260Nick FeamsterFebruary 13, 2006

  2. Administrivia • Project meetings: today and tomorrow (next Monday as needed) • New problem set out Wednesday • 3-4 (shorter) problems, 9 days

  3. Today’s Lecture • Why measure? • Troubleshooting and debugging • Incorporation into existing protocols • Performance monitoring • Science • What to measure? • Routing, traffic volumes, application-level statistics, paths, etc. • Pitfalls • Measurement wishlist

  4. Traffic: Why Measure? • Billing • Measure usage on links to/from customers • Traffic engineering and capacity planning • Measure the traffic matrix (i.e., offered load) • Tune routing protocol or add new capacity • Troubleshooting • Anomaly detection (Lecture 12, next week)

  5. Traffic Engineering/Capacity Planning • Need to estimate offered traffic loads BGP policy configuration Topology BGP routing model Offered traffic eBGP routes Flow of traffic through the network

  6. Billing for Internet Usage • 95th Percentile billing • Customer network pays for “committed information rate” (CIR) • Throughput measured every 5 minutes (typically with SNMP; flow statistics also can be used for billing) • Customer billed based on 95th percentile

  7. Passive vs. Active Measurement • Passive Measurement:Collection of packets, flow statistics of traffic that is already flowing on the network • Packet traces • Flow statistics • Application-level logs • Active Measurement: Inject “probing” traffic to measure various characteristics • Traceroute • Ping • Application-level probes (e.g., Web downloads)

  8. Passive Traffic Data Measurement • SNMP byte/packet counts: everywhere • Packet monitoring: selected locations • Flow monitoring: typically at edges (if possible) • Direct computation of the traffic matrix • Input to denial-of-service attack detection • Deep Packet Inspection: also at edge, where possible

  9. Simple Network Management Protocol • Management Information Base (MIB) • Information store • Unique variables named by OIDs • Accessed with SNMP • Specific MIBs for byte/packet counts (per link) SNMP Manager Agent ManagedObjects DB

  10. SNMP (Passive) • Advantage: ubiquitous • Supported on all networking equipment • Multiple products for polling and analyzing data • Disadvantages: see Lecture 6 • Coarse granularity • Cannot express complex queries on the data • Unreliable delivery of the data using UDP • Utility • Link utilization (billing) • Traffic matrix inference

  11. Packet-level Monitoring • Passive monitoring to collect full packet contents (or at least headers) • Advantages: lots of detailed information • Precise tming information • Information in packet headers • Disadvantages: overhead • Hard to keep up with high-speed links • Often requires a separate monitoring device

  12. Full Packet Capture (Passive) Example:Georgia Tech OC3Mon • Rack-mounted PC • Optical splitter • Data Acquisition and Generation (DAG) card Source: endace.com

  13. Packet-Level Monitoring http://optimusprime.noc.gatech.edu/current/ • Fine-grained timing and application-level information (port numbers, TCP flags, URLs, etc.)

  14. Port-level information Example: Traffic to and from CoC • Note: lots of port 22 traffic (different from other GT subnets)

  15. Detailed Traffic Accounting • Traffic monitoring by host • Daily summaries Planetlab In (MB) Out (MB) Total (MB) 199.77.129.53 26319.49 27064.46 53383.95 199.77.128.194 2278.77 18858.23 21137.00 199.77.129.73 5176.44 15797.16 20973.60 199.77.128.193 1321.03 11799.69 13120.71 Spamhaus

  16. Traffic Flow Statistics • Flow monitoring (e.g., Cisco Netflow) • Statistics about groups of related packets (e.g., same IP/TCP headers and close in time) • Recording header information, counts, and time • More detail than SNMP, less overhead than packet capture • Typically implemented directly on line card

  17. What is a flow? • Source IP address • Destination IP address • Source port • Destination port • Layer 3 protocol type • TOS byte (DSCP) • Input logical interface (ifIndex)

  18. Core Network Cisco Netflow • Basic output: “Flow record” • Most common version is v5 • Current version (9) is being standardized in the IETF (template-based) • More flexible record format • Much easier to add new flow record types Collector (PC) Approximately 1500 bytes 20-50 flow records Sent more frequently if traffic increases Collection and Aggregation

  19. Flow Record Contents • Source and Destination, IP address and port • Packet and byte counts • Start and end times • ToS, TCP flags Basic information about the flow… …plus, information related to routing • Next-hop IP address • Source and destination AS • Source and destination prefix

  20. Aggregating Packets into Flows • Criteria 1: Set of packets that “belong together” • Source/destination IP addresses and port numbers • Same protocol, ToS bits, … • Same input/output interfaces at a router (if known) • Criteria 2: Packets that are “close” together in time • Maximum inter-packet spacing (e.g., 15 sec, 30 sec) • Example: flows 2 and 4 are different flows due to time flow 4 flow 1 flow 2 flow 3

  21. Netflow Processing • Create and update flows in NetFlow Cache • Inactive timer expired (15 sec is default) • Active timer expired (30 min (1800 sec) is default) • NetFlow cache is full (oldest flows are expired) • RST or FIN TCP Flag • Expiration • Aggregation? No Yes e.g. Protocol-Port Aggregation Scheme becomes • Export Version Aggregated Flows – export Version 8 or 9 Non-Aggregated Flows – export Version 5 or 9 • Transport Protocol Export Packet Payload (flows) Header

  22. Reducing Measurement Overhead • Filtering:on interface • destination prefix for a customer • port number for an application (e.g., 80 for Web) • Sampling: before insertion into flow cache • Random, deterministic, or hash-based sampling • 1-out-of-n or stratified based on packet/flow size • Two types: packet-level and flow-level • Aggregation: after cache eviction • packets/flows with same next-hop AS • packets/flows destined to a particular service

  23. Packet Sampling • Packet sampling before flow creation (Sampled Netflow) • 1-out-of-m sampling of individual packets (e.g., m=100) • Create of flow records over the sampled packets • Reducing overhead • Avoid per-packet overhead on (m-1)/m packets • Avoid creating records for a large number of small flows • Increasing overhead (in some cases) • May split some long transfers into multiple flow records • … due to larger time gaps between successive packets time not sampled timeout two flows

  24. Sampling: Flow-Level Sampling • Sampling of flow records evicted from flow cache • When evicting flows from table or when analyzing flows • Stratified sampling to put weight on “heavy” flows • Select all long flows and sample the short flows • Reduces the number of flow records • Still measures the vast majority of the traffic sample with 0.1% probability Flow 1, 40 bytes Flow 2, 15580 bytes Flow 3, 8196 bytes Flow 4, 5350789 bytes Flow 5, 532 bytes Flow 6, 7432 bytes sample with 100% probability sample with 10% probability

  25. Traffic Engineering/Capacity Planning • Need to estimate offered traffic loads BGP policy configuration How to get this? Topology BGP routing model Offered traffic eBGP routes Flow of traffic through the network

  26. Traffic Matrix Estimation From link counts to the traffic matrix Sources 5Mbps 3Mbps 4Mbps 4Mbps Destinations

  27. Formalization • Source-destination pairs • p is a source-destination pair of nodes • xp is the (unknown) traffic volume for this pair • Links in the network • l is a unidirectional edge • yl is the observed traffic volume on this link • Routing • Rlp= 1 if link l is on the path for src-dest pair p • Or, Rlp is the proportion of p’s traffic that traverses l • y = Rx (now work back to get x)

  28. Problem: Underconstrained • Linear system is underdetermined • Number of nodes n • Number of links eisaroundO(n) • Number of src-dest pairs c is O(n2) • Dimension of solution sub-space at least c - e • Multiple observations can help • k independent observations (over time) • Stochastic model with src-dest counts Poisson & i.i.d • Maximum likelihood estimation to infer traffic matrix • Use NetFlow to augment byte counts can help constrain the problem • Lots of recent work on traffic matrix estimation

  29. Trend: “Deep Packet Inspection” • Intrusion detection capabilities in the router • Analysis of packet payloads • Recognition of application type • Analyzes and identifies application traffic in real time • Implemented with programmable hardware • Different types of analysis techniques • Signature-based • Rule-based • Statistical

  30. NetFlow vs. NBAR 30

  31. Passive Traffic Data Measurement • SNMP byte/packet counts: everywhere • Packet monitoring: selected locations • Flow monitoring: Tracking the application mix • Direct computation of the traffic matrix • Input to denial-of-service attack detection

  32. Active Measurement Tools • Send probes and measure a response • A few common tools: • Ping: RTT and loss • Traceroute: path and RTT • iPerf:generation of CBR flows for throughput estimation • Pathchar: per-hop bandwidth, latency, loss measurement • Pchar, clink: open-source reimplementation of pathchar • Nettimer (Lai): bottleneck bandwidth using packet pair method

  33. Using Traceroute to Measure Paths • Send packets with increasing TTL values TTL=1 TTL=2 TTL=3 ICMP “time exceeded

  34. Problems with Traceroute • Can’t unambiguously identify one-way outages • Failure to reach host : failure of reverse path? • ICMP messages may be filtered or rate-limited • IP address of “time exceeded” packet may be the outgoing interface of the return packet TTL=1 TTL=2 TTL=3

  35. Routing: Why Measure? • Detect anomalies (e.g., route hijacks, routing loops, oscillations, etc.) • More likely: to help explain a problem observed with traffic.

  36. Common Mode: iBGP “Monitor” • iBGP session to software box • Sees only the best route • Two types of logs: • Table dumps • BGP updates iBGP session

  37. “CPR”: Network Measurement at Georgia Tech • Detection of non-catastrophic failures • Discovered only when users call to complain • “The Internet is slow this morning” • This probably doesn’t coincide with a warning in OpenView. • Little quantitative data to verify problems or determine the possible location of the issue.

  38. Today’s Paper • Paxson, Strategies for Sound Internet Measurement • Lessons • Meta-data • Self-consistency checks • Examining the semantics of the data (e.g., TCP acknowledgements without data) • Analysis of small subsets of data (especially for large datasets) • Automated analysis of long-running measurements

  39. End-to-End Routing Behavior • Routing loops • 3 hours vs. more than 12 hours (two modes) • Erroneous routing • Very longs paths • Connectivity altered mid-stream • Fluttering • Infrastructure failures • Two important stability notions • Prevalence • Persistence

More Related