Internet Measurement

Internet Measurement Jennifer Rexford

Outline • Measurement overview • Why measure? Why model measurements? • What to measure? Where to measure? • Internet challenges • Measurement tools • Active: ping, traceroute, and pathchar • Passive: logs, SNMP, packet, and flow monitoring • Operational applications of measurement • Discussion

Why Measure? • The Internet is a man-made system, so why do we need to measure it? • Because we still don’t really understand it • Because sometimes things go wrong • Measurement for network operations • Detecting and diagnosing problems • What-if analysis of future changes • Measurement for scientific discovery • Characterizing a complex system as organism • Creating accurate models that represent reality • Identifying new features and phenomena

Why Build Models of Measurements? • Compact summary of measurements • Efficient way to represent a large data set • E.g., exponential distribution with mean 100 sec • Expose important properties of measurements • Reveals underlying cause or engineering question • E.g., mean RTT to help explain TCP throughout • Generate random but realistic data as input • Generate new data that agree in key properties • E.g., topology models to feed into simulators “All models are wrong, but some models are useful.” – George Box

What Can be Measured? • Traffic • Load statistics • Packet or flow traces • Performance of paths • Application performance, e.g,. Web download time • Transport performance, e.g., TCP bulk throughput • Network performance, e.g., packet delay and loss • Network structure • Topology, and paths on the topology • Dynamics of the routing protocol

Where Measure? • Short answer • Anywhere you can!  • End hosts • Application logs, e.g., Web server logs • Sending active probes to measure performance • Individual links/routers • Load statistics, packet traces, flow traces • Configuration state • Routing-protocol messages or table dumps • Alarms

Internet Challenges Make Measurement an Art • Stateless routers • Routers do not routinely store packet/flow state • Measurement is an afterthought, adds overhead • IP narrow waist • IP measurements cannot see below network layer • E.g., link-layer retransmission, tunnels, etc. • Violations of end-to-end argument • E.g., firewalls, address translators, and proxies • Not directly visible, and may block measurements • Decentralized control • Autonomous Systems may block measurements • No global notion of time

Active Measurement: Ping • Adding traffic for purposes of measurement • Trade-offs between accuracy and overhead • Need careful methods to avoid introducing bias • Ping • Host sends an ICMP ECHO packet to a target • … and captures the ICMP ECHO REPLY • Useful for checking connectivity, and RTT • Only requires control of one of the two end-points • Problems with ping • Round-trip rather than one-way delays • Some hosts might not respond

Time exceeded TTL=1 TTL=2 Active Measurement: Traceroute • Time-To-Live field in IP packet header • Source sends a packet with a TTL of n • Each router along the path decrements the TTL • “TTL exceeded” sent when TTL reaches 0 • Traceroute tool exploits this TTL behavior destination source Send packets with TTL=1, 2, 3, … and record source of “time exceeded” message

Active Measurement: Challenges of Traceroute • Measuring multiple paths • Successive probes may traverse different paths • Non-participating network elements • Some routers and firewalls don’t reply • Inaccurate delay information • Includes processing delays on the router CPU • Round-trip vs. one-way measurements • Paths may have asymmetric properties • Interfaces, not routers • Returns IP address of interfaces, not routers

Active Measurement: Applications of Traceroute • Network troubleshooting • Identify forwarding loops and black holes • Identify long and convoluted paths • See how far the probe packets get • Network topology inference • Launch traceroute probes from many places • … toward many destinations • Join together to fill in parts of the topology • … though traceroute undersamples the edges

Active Measurement: Pathchar for Links rtt(i+1) -rtt(i) Three delay components:  min. RTT (L) slope=1/c d How to infer d,c? L

Passive Measurement: Logs at Hosts • Web server logs • Host, time, URL, response code, content length, … • E.g., 122.345.131.2 - - [15/Oct/1998:00:00:25 -0400] "GET /images/wwwtlogo.gif HTTP/1.0" 304 - "http://www.aflcio.org/home.htm" "Mozilla/2.0 (compatible; MSIE 3.02; Update a; AK; AOL 4.0; Windows 95)" "-" • DNS logs • Request, response, time • Useful for workload characterization, troubleshooting, etc.

Passive Measurement: SNMP • Simple Network Management Protocol • Coarse-grained counters on the router • E.g., byte and packet counts • Polling • Management system can poll the counters • E.g., once every five minutes • Limitations • Extremely coarse-grained statistics • Delivered over UDP! • Advantages: ubiquitous

Shared media (Ethernet, wireless) Multicast switch Host C Host A Monitor Host A Host B S w i t c h Host B Monitor Monitor Splitting a point-to-point link Router B Router A Passive Measurement: Packet Monitoring • Tapping a link Line card that does packet sampling Router A

Packet Monitoring: Selecting the Traffic • Filter to focus on a subset of the packets • IP addresses/prefixes (e.g., to/from specific Web sites, client machines, DNS servers, mail servers) • Protocol (e.g., TCP, UDP, or ICMP) • Port numbers (e.g., HTTP, DNS, BGP, Napster) • Collect first n bytes of packet (snap length) • Medium access control header (if present) • IP header (typically 20 bytes) • IP+UDP header (typically 28 bytes) • IP+TCP header (typically 40 bytes) • Application-layer message (entire packet)

Tcpdump Output(three-way TCP handshake and HTTP request message) timestamp client address and port # Web server (port 80) 23:40:21.008043 eth0 > 135.207.38.125.1043 > lovelace.acm.org.www: S617756405:617756405(0) win 32120 <mss 1460,sackOK,timestamp 46339 0,nop,wscale 0> (DF) SYN flag TCP options sequence number 23:40:21.036758 eth0 < lovelace.acm.org.www > 135.207.38.125.1043: S 2598794605:2598794605(0) ack 617756406 win 16384 <mss 512> 23:40:21.036789 eth0 > 135.207.38.125.1043 > lovelace.acm.org.www: . 1:1(0) ack 1 win 32120 (DF) 23:40:21.037372 eth0 > 135.207.38.125.1043 > lovelace.acm.org.www: P 1:513(512) ack 1 win 32256 (DF) 23:40:21.085106 eth0 < lovelace.acm.org.www > 135.207.38.125.1043: . 1:1(0) ack 513 win 16384 23:40:21.085140 eth0 > 135.207.38.125.1043 > lovelace.acm.org.www: P 513:676(163) ack 1 win 32256 (DF) 23:40:21.124835 eth0 < lovelace.acm.org.www > 135.207.38.125.1043: P 1:179(178) ack 676 win 16384

Analysis of Packet Traces • IP header • Traffic volume by IP addresses or protocol • Burstiness of the stream of packets • Packet properties (e.g., sizes, out-of-order, etc.) • TCP header • Traffic breakdown by application (e.g., Web) • TCP congestion and flow control • Number of bytes and packets per session • Application header • URLs, HTTP headers (e.g., cacheable response?) • DNS queries and responses, user key strokes, …

Aggregating Packets into IP Flows • Set of packets that “belong together” • Source/destination IP addresses and port numbers • Same protocol, ToS bits, … • Same input/output interfaces at a router (if known) • Packets that are “close” together in time • Maximum spacing between packets (e.g., 15 sec, 30 sec) • Example: flows 2 and 4 are different flows due to time flow 4 flow 1 flow 2 flow 3

Packet vs. Flow Measurement • Basic statistics (available from both techniques) • Traffic mix by IP addresses, port numbers, and protocol • Average packet size • Traffic over time • Both: traffic volumes on a medium-to-large time scale • Packet: burstiness of the traffic on a small time scale • Statistics per TCP connection • Both: number of packets & bytes transferred over the link • Packet: frequency of lost or out-of-order packets, and the number of application-level bytes delivered • Per-packet info (available only from packet traces) • TCP seq/ack #s, receiver window, per-packet flags, … • Probability distribution of packet sizes • Application-level header and body (full packet contents)

Measurement Challenges for Operators • Network-wide view • Crucial for evaluating control actions • Multiple kinds of data from multiple locations • Large scale • Large number of high-speed links and routers • Large volume of measurement data • Poor state-of-the-art • Working within existing protocols and products • Technology not designed with measurement in mind • The “do no harm” principle • Don’t degrade router performance • Don’t require disabling key router features • Don’t overload the network with measurement data

Network Operations Tasks • Reporting of network-wide statistics • Generating basic information about usage and reliability • Performance/reliability troubleshooting • Detecting and diagnosing anomalous events • Security • Detecting, diagnosing, and blocking security problems • Traffic engineering • Adjusting network configuration to the prevailing traffic • Capacity planning • Deciding where and when to install new equipment

Basic Reporting • Producing basic statistics about the network • For business purposes, network planning, ad hoc studies • Examples • Proportion of transit vs. customer-customer traffic • Total volume of traffic sent to/from each private peer • Mixture of traffic by application (Web, Napster, etc.) • Mixture of traffic to/from individual customers • Usage, loss, and reliability trends for each link • Requirements • Network-wide view of basic traffic and reliability statistics • Ability to “slice and dice” measurements in different ways(e.g., by application, by customer, by peer, by link type)

Troubleshooting • Detecting and diagnosing problems • Recognizing and explaining anomalous events • Examples • Why a backbone link is suddenly overloaded • Why the route to a destination prefix is flapping • Why DNS queries are failing with high probability • Why a route processor has high CPU utilization • Why a customer cannot reach certain Web sites • Requirements • Network-wide view of many protocols and systems • Diverse measurements at different protocol levels • Thresholds for isolating significant phenomena

Security • Detecting and diagnosing problems • Recognizing suspicious traffic or disruptions • Examples • Denial-of-service attack on a customer or service • Spread of a worm or virus through the network • Route hijack of an address block by adversary • Requirements • Detailed measurements from multiple places • Including deep-packet inspection, in some cases • Online analysis of the data • Installing filters to block the offending traffic

Traffic Engineering • Adjusting resource allocation policies • Path selection, buffer management, and link scheduling • Examples • OSPF weights to divert traffic from congested links • BGP policies to balance load on peering links • Link-scheduling weights to reduce delay for “gold” traffic • Requirements • Network-wide view of the traffic carried in the backbone • Timely view of the network topology and configuration • Accurate models to predict impact of control operations(e.g., the impact of RED parameters on TCP throughput)

Capacity Planning • Deciding whether to buy/install new equipment • What? Where? When? • Examples • Where to put the next backbone router • When to upgrade a link to higher capacity • Whether to add/remove a particular peer • Whether the network can accommodate a new customer • Whether to install a caching proxy for cable modems • Requirements • Projections of future traffic patterns from measurements • Cost estimates for buying/deploying the new equipment • Model of the potential impact of the change (e.g., latency reduction and bandwidth savings from a caching proxy)

Examples of Public Data Sets • Network-wide data • Abilene and GEANT backbones • Netflow, IGP, and BGP traces • CAIDA DatCat • Data catalogue maintained by CAIDA • http://imdc.datcat.org/ • Interdomain routing • RouteViews and RIPE-NCC • BGP routing tables and update messages • Traceroute and looking glass servers • http://www.traceroute.org/ • http://www.nanog.org/lookingglass.html

Discussion • How important is accuracy of the data? • How can we validate measurement studies? (If we know the answer already, why are we measuring?) • How to do controlled experiments with measurement techniques? • Can we move measurement to a science rather than an art? • Can we identify incentives for making measurement possible and data available? • Distributed analysis of measurement data? • An architecture for router or line-card support for traffic and performance measurement? • Trade-offs between security and privacy?

Internet Measurement