740 likes | 897 Views
Berkeley-Helsinki Summer Course Lecture #7: Network Measurement and Monitoring. Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department University of California Berkeley, CA 94720-1776. Outline. Web Traffic Measurement Multi-layer Tracing and Analysis
E N D
Berkeley-Helsinki Summer CourseLecture #7: Network Measurement and Monitoring Randy H. Katz Computer Science Division Electrical Engineering and Computer Science Department University of California Berkeley, CA 94720-1776
Outline • Web Traffic Measurement • Multi-layer Tracing and Analysis • Network Distance Mapping • SLA Verification • Service Management
Outline • Web Traffic Measurement • Multi-layer Tracing and Analysis • Network Distance Mapping • SLA Verification • Service Management
Measuring/CharacterizingWeb Traffic • Motivation for Measurement • Insights into Web site design • Managing Proxies and Servers • Operating IP Networks • Measurement Process • Monitoring from some network location • Generate measurement records in some format • Preprocessing for subsequent analysis • Based on Chapter 9, “Web Traffic Measurement,” in Web Protocols and Practice, Krishnamurthy and Rexford, Addison Wesley, Reading, MA, 2001.
Web Measurment • Content Creators • Measurements of user browsing patterns • Number of visitors, site stickiness influences advertising revenue • Optimize for common user sequences • User perceived latency influences server and placement decisions • Web Hosting Company • Number of response messages/bytes served influence load balancing strategy among multiple hosted sites • Mix of busy day sites/busy night sites • Managing persistent connections • Resource usage influences billing • When to introduce more servers, better connectivity
Web Measurement • Network Operators • Resource decisions: where to add bandwidth, when to upgrade links, where to place proxies, caches, how to modify routing within the provider cloud, etc. • User community: relative mix of clients with low vs. high bandwidth connectivity • Web/Networking Researchers • Evaluating performance of protocols and software • Drive evolution of protocols, policies, algorithms • Better understanding of Internet traffic dynamics
Measurement Techniques • Server Logging • Log entry per HTTP request • Requesting client • Could be a user, a proxy, or a cache—the latter two represent aggregated patterns • Identified by an IP addresd • Could represent the workload of multiple users • Dynamically assigned addresses not correlated with same user each time encountered • Request time • Request/response messages • Coarse grained, aggregated times • NOTE: proxy/cache satisfied requests filtered before reaching the server • Hard to obtain!
Measurement Techniques • Proxy Logging • Proxies can be associated with clients or servers, e.g., proxy for UC Berkeley vs. proxy for Google • Former provides insights into client behavior aggregated by administrative domain; more detailed information about individual clients may be available • Degree of aggregation depends on how close proxy is to clients (close implies small community, far implies large community) • Limited scope, accesses filtered by browser caches • Hard to obtain!
Measurement Techniques • Packet Monitoring • Network level logging (HTTP, IP, TCP) • Fine grained time stamping possible • Some requests satisfied from client caches, encrypted packets could represent collection difficulties • Monitor needs to be placed so as to be able to ease drop on packets
Measurement Techniques • Active Measurement • Generate requests in a controlled manner, observe their performance • Issues: • Where to locate the modified user agents—geographical placement, quality of connectivity to wide-area network • What requests to generate—e.g., based on profile of popular web sites • What measurements to collect—DNS queries, TCP timeouts, proxy interception difficult to distinguish sources of latencies
Inferences from Measurement Data • Limitation of HTTP Header Information • Incomplete header logging • Heuristics needed to reconstruct behavior from log • Ambiguous Client/Server Identity • Client identity/unique IP address • Many IP addresses associated with same server • Inferring User Actions • Difficult to correlate user level actions like mouse clicks with observed network activity • One click many http requests • Detecting Resource Modifications • Web level actions typically miss modifications • Incomplete use of Last-Modified and Date fields by servers
Web Workload Characterization • Applications of Workload Models • Identifying performance problems • High latency/low thruput under specific load scenarios • Benchmarking Web components • Selecting among competing architectures • Capacity planning • “Right sizing” net b/w, CPU, disk, memory given expected loads • Workload Parameters • Protocols: Request method/Response code • Resources: Content type, Resource size, Response size, Popularity, Modification frequency, Temporal locality, Number of embedded resources • Users: Session interarrival times, Number of clicks per session, Request interarrival times
Workload Characteristics • HTTP Requests/Responses • GET method predominates, small number of POSTs (forms), OK responses • More intelligent protocols for communicating with caches may change distribution of requests (e.g., HEAD) • Web Resources • Text and images dominate, increasing audio/video content • Small resource size dominates, average HTML file size is 4-8 KB, image file size 14 KB, wide variation around the mean implies Pareto distribution (“heavy tailed”) • Higher b/w connections imply larger web objects over time • Response Sizes • Users likely to abort large transfers, so median response size smaller than median resource size; very heavy tail • Effect of higher b/w connections on response size?
Workload Characteristics • Resource Popularity • Zipf’s Law: a small number of objects are highly popular • Effectiveness of caching at all levels (client browser cache, site proxy cache, even DNS name cache) • Resource Changes • Static content vs. script-based descriptions • Periodic changes (“young die young”) • Temporal Locality • Correlated access to resources in time • Embedded Resources • Web pages have median of 8-20 embedded resources, heavy tailed distribution
Workload Characteristics • User Behavior • Session and request arrivals • Infer session via repeated access to same server • Burst of HTTP requests, think time • Clicks per session • 4-10 clicks on average; distinguish between “sticky” sites and directory/redirection sites • Heavy user vs. light user • Request interarrival times • Activity punctuated with think times • Request interarrivals order of 60 seconds
Research Perspectives on Measurement • Packet monitoring of HTTP traffic • Analyzing Web server logs • Publicly available logs and traces • Measuring multimedia streams
Packet Monitoring of HTTP Traffic • Tapping a link carrying IP packets • Capturing packets from HTTP transfers • Demux packets into TCP connections • Reconstructing ordered stream of bytes • Extracting HTTP messages from byte stream • Generating a log of HTTP messages
Analyzing Web Server Logs • Parsing and Filtering • Logs in multiple formats • Interleaved log records • Timestamp diversity • Transforming • Remove erroneous records • Diverse formats for URLs, conversion to unique integers for easier processing
Publicly Available Logs and Traces • Internet Traffic Archive • http://www.acm.org/sigcomm/ita • World Wide Web Consortium’s Web Characterization Group Repository • http://www.purl.org/net/repository • NLANR • http://ircache.nlanr.net/Cache/ • CAnet Squid logs • http://ardnoc41.canet2.net/cache/
Measuring Multimedia Streams • Static analysis of multimedia resources • Locating video content at various web sites • Acquiring copies • Computing statistics • Multimedia server logs • VCR-like operations • User access patterns, frequency of early abort • Packet monitoring of multimedia streams • Infer session identity from src/dst IP address, port #, protocol • Multilayer packet monitoring • Correlation of control and data streams
Probability Distributions in Web Workload Models • Exponential: Session interarrival times • Pareto: • Response Sizes (tail of distribution) • Resource Sizes (tail of distribution) • Number of Embedded Images • Request Interarrival Times • Lognormal: • Response sizes (body of distribution) • Resource sizes (body of distribution) • Temporal locality • Zipf-like: Resource popularity
Outline • Web Traffic Measurement • Multi-layer Tracing and Analysis • Network Distance Mapping • SLA Verification • Service Management
Wireless Link Management • Modeling GSM data network layers • Media access, link, routing, and transport • Validated ns modeling suite and BONES simulator • GSM channel error models from Ericsson • Reliable Link Protocols • Wireless links have high error rates (> 1%) • Reliable transport protocols (TCP) interpret errors as congestion • Need tools to determine multi-layer interaction effects • Large amounts of data: 120 bytes/s • Important for design of next generation networks • One solution: use a reliable link layer (ARQ) protocol • However, retransmissions introduce jitter • Alternative: use error-resilient algorithms to allow apps to handle corrupted data (only protect network protocol headers) • Less end-to-end delay, constant jitter, higher throughput
PSTN Testbed, Protocols, Tools H.263+ Encoder H.263+ Decoder Packetization De-Packetization RTP RTP Socket Interface Socket Interface UDP / UDP Lite UDP / UDP Lite IP IP PPP PPP Transparent /Non-transparent Transparent /Non-transparent GSM Network Fixed Host Unix BSDi 3.0 GSM BTS Mobile Host Unix BSDi 3.0 SocketDUMP MultiTracer SocketDUMP RLPDUMP RLPDUMP Plotting & Analysis (MATLAB)
Outline • Web Traffic Measurement • Multi-layer Tracing and Analysis • Network Distance Mapping • SLA Verification • Service Management
Applications of Network Distance Mapping • Mirror Selection • Cache-infrastructure Configuration • Service Redirection • Service Placement • Overlay Routing/Location
Distance Mapping Framework Goal: Develop scalable, robust distance information collection/sharing infrastructure • Feasible distance metrics • Number of hops • Latency • Bandwidth • Continuous measurement • Provide approximate distance information • Continue to operate in the presence of components changes/failures • Scale the measurement by self-adaptation
Distance Mapping Challenges • Select how may probes/monitors to deploy • Monitor placement • Choose appropriate monitor for given client • Statistically quantify estimation error: e.g, x% of the estimates within a factor of actual distances • How stable are these clustering?
IDMaps Project • Internet-wide infrastructure to collect distance information • IDMaps provides: • Long-term approximate distances • Distance estimation between any 2 points on the Internet • IDMaps does not provide: • End-to-end application-level performance • Available bandwidth or current delay • Characteristics of any specific path
tracer Hosts in AP near tracer T*T + AP cost T = number of tracers AP = number of APs IDMaps Components • Tracers: autonomous instrumentation boxes • Tracers measures distance between themselves and to APs • APs (Address Prefixes): regions of the Internet; Hosts within AP are equi-distant from rest of Internet Courtesy of IDMaps group
IDMaps Architecture Courtesy of IDMaps group
Complementary distribution function Percentage of correct answers IDMaps Results and Limitations • Simulation results on synthetic and static network topology • Cyan: random selection • Others: various heuristics & algorithms Courtesy of IDMaps group
Clients Monitors A D C B AB = AC + CD + DB ? IDMaps Limitations • Based on triangulation inequality • Consider only number of hops • Ignore the dynamics of Internet, no stability study
Wide-area Network Measurement and Monitoring Services Goal: Understand behavior of Internet/provide adaptation to Internet apps thru monitoring services • Layered Architecture • Bottom layer a common core shared across multiple apps with generic metrics • More application-specific at the top layer • Modularity • Separation of functionality • Clear definition of interaction between different layers • Ease of customization and modification
Decision/Design Procedures Dissemination Layer Federation for Sharing Layer Measurement Collection, Transformation and Storage Layer Measurement Layer Layered Architecture Application side Pull-/push- based APIs • What to measure, what tools? • Probe placement & density
Current Focus at Berkeley: Internet “Iso-bar” • Regions of network that perceive similar performance to the Internet, i.e., spatial correlation • How to find it without knowing the topology? • Used to determine # and placement of monitors;High dimensional feature space for iso-bar clustering • Each host collects distance values to m hosts as m-dim feature vector • Use K-means for high-dimension clustering • Choose site closest to the cluster center as monitor • Initially m can be the total number of clients, later it may be the number of representative monitoring sites
Iso-bar Experiments • Remove triangulation inequality assumption • Stationarity: Predictability of network properties – temporal correlation • Global stationarity: change of the total number of clusters • Local stationarity: expand and shrink of each cluster • Experiements with NLANR Active Measurement Project (AMP) data set • 119 sites on US and New Zealand • Traceroute between every pair of hosts every minute • Use daily average round-trip time (RTT) • Color the clustered hosts and map them on US map with longitude and latitude info (imprecise mapping)
Underlying Topology of NLANR Sites Most of the NLANR sites use Abilene Network
Stationarity of Iso-bar • Global stationarity quite good • Local stationarity still under investigation • Will apply more statistical learning methods, e.g., Gaussian mixture model, kernel methods for clustering and its dynamics • Will evaluate its prediction with real measurement data
Inferring Internet Topology Goal: Determine hierarchy amongst autonomous systems(AS) based on types of relationships among them • Assume two-types of relationships • Provider-Customer • Peer-Peer • Providers are above customers in the hierarchy; peers mostly in same level in the hierarchy. • Inferences • 5-level hierarchy in the Internet • Connectivity across levels is strictly non-hierarchical
Inferring Internet Topology • CAIDA & Mercator • Traceroutes from diff locations to get connectivity • Whois & BGP dumps to find IP addr ownership • Krishnamurthy et al. • BGP dumps to find IP addr ownership • Use web server logs to cluster IP addrs by behaviour • GT-ITM • Generated topologies • Useful for testing on specific cases, but not actual Internet • Our work • BGP dumps to find AS connectivity • BGP dumps to find amount of paths carried by each link • BGP dumps to find AS preferences for links
Inferring Type of Relationship Assumption: ISPs with high probability do not forward BGP advertisements from its peers or providers to other peers or providers • Implication: If assumption is completely true, every AS path is “valley-free” (no traversal from peer/provider to customer and back to peer/provider) • Features of inference algorithm • Collected large # of BGP dumps; Partial views of Internet from different sources • Assign every AS rank based on every dump;Apply dominance/clustering rules to find type of relationships
Layers in the Internet • Layer 0 (Strong Core) • Dense sub-graph(peering links) of the Internet topology consisting of only Tier-1 ISPs • Layer 1 (Transit Core) • Consists of all top transit providers/large national ISPs • Layer 2 (Outer Core) • Last layer where any two ASs have peering relationship • Layer 3 (Regional) • Collection of regional ISPs that support small customer base • Layer 4 (Customers) • Large collection (87%) of ASs that are only customers
Our Findings • Innercore of 20 AS’s is highly connected • 271 edges (full clique = 380) • Full graph has 10918 AS’s • 24,598 edges out of 119,191,806 possible edges • Distribution of paths carried by edges
Quantifying the Layering # Intra- Layer Edges # Inter- Layer edges Layer # of ASs % Strong Core 329 9600 20 0.2 162 6000 Transit Core 1.5 1052 Outer Core 1070 674 6.3 3600 Regional 950 202 2400 9.2 0 8852 Customers 83.0 0 Note: Edges directed from providers to customers; peer-peer links directed both ways