270 likes | 290 Views
This paper discusses the use of trajectory sampling for direct traffic observation in networks, and proposes a method using deterministic hash functions to determine a representative subset of packets. It explores implementation issues and compares the results with other common approaches.
E N D
Trajectory Sampling for Direct Traffic Oberservation N.G. Duffield and Matthias Grossglauser IEEE/ACM Transactions on Networking, Vol. 9, No. 3 June 2001
Circuit switched networks (e.g. telephone): Per-call state is maintained =>trivial IP networks: Don’t maintain per-flow information ? Problem: Which (spatial) path does traffic take?
Why is this interesting? • Quality of Service depends on traffic management • Traffic control • Timescale: seconds • no human intervention • Traffic engineering • Timescale: minutes - months • Resource allocation • Pricing • Failover strategies
Indirect measurement Uses information on Network model Network state Direct measurement Direct observation of traffic at multiple points in the network Options
Problems with indirect measurement • Behavior of network elements depends on vendor-specific design choices • Deliberate sources of randomness to avoid collision • Events outside domain (route advertising by neighboring domains) • Interactions may be too complex to predict
Direct measurement:Sampling of packets • Sample packets that traverse each link • Subset of packets used as representative Problem: How do we get the actual path?
Key idea of the paper • Use a deterministic hash function over the packet’s content to determine subset of packets • Use the same hash function throughout the domain • Use second hash function to label packets
Theory • Measurement domain represented as a directed graph • Packets • enter at ingress node • exit at egress node • Invariance function • Packet content without changing fields, e.g. time-to-live field which is decremented each hop
Sampling Hash Function • Decides whether or not a given packet should be sampled • Deterministic function of the invariant packet content • Same function on each link • Results in L-bit binary number
Identification Hash Function • Entire packet content could be used • Aim: limit traffic to measurement collection system • Results in m-bit binary number • Additional information may be included • Length of packet • Source, destination
Invariant content • Header: three categories of fields • Variable fields (not included) • E.g., TTL, header checksum, etc. • Low entropy fields (not included) • Content changes little between packets • E.g., version, header length, protocol • High entropy fields (included) • Source and destination IP, etc. • Part of remainder of packet
Dealing with ambiguities • Probability that trajectory can be disambiguated depends on network topology and traffic => renormalization of results necessary • Safer to discard all duplicate labels (greater loss of samples)
Specification of Hash Functions • Ordered bits of invariant part of packet content x are considered as binary integers: f(x) • Sampling hash function h(f(x)) = f(x) mod A • Identification hash function g(f(x)) = f(x) mod B with A, B positive integers
Identical Packets • Automatically ambiguous => lead to biased estimators Question: • How much packet content is needed to avoid collisions? Answer: • 40 bytes lead to collision probability smaller than 10-3
Implementation of hashing • 40 byte “numbers” are represented by vector of 16 bit words z = (zk ,zk-1,…,z0) = Si=0k zi 216i • Use 32 bit long division • Iteratively compute (zk ,zk-1,…,z0) mod A = (zk-1+ 216(zk mod A),…,z0) mod A
Sampling independent of packet content? • Note: IP address of source and destination are included in the invariant content! • Chi-squared test • 40 byte packet prefix => 95% confidence level • 20 byte packet prefix results in strong dependence
Optimal Sampling • Tradeoff • More unambiguous samples => more accuracy • More samples => more measurement traffic • Optimize for given measurement traffic mn (m bits per sample, n samples) • Small m increases collisions • Large m means smaller n
(Question to the authors • Doesn’t the measurement traffic itself get sampled and thereby add another source of error? • … may be part of their future work statement)
Example Service provider wants to determine what fraction of packets on a certain backbone link belongs to a certain customer • Compare • customer packets observed both on backbone and on access link • Total number of packets observed on backbone • Real and estimated fractions largely within error bars
Implementation issues • Can trajectory sampling be part of next generation of high-speed interfaces? • Authors claim “yes”: • Compute both hash functions in parallel • Processor cost negligible compared with cost of interface cards • Processor speed doubles every 18 months, maximum trunk speed every 21 months
Aggregation-based approaches e.g., sum of packets traversing a link Sampling-based approaches sample subset of observations Other Common Approaches
Aggregation-based Approaches • Link measurements (direct) • Traffic statistics (# of bytes / # of packets transferred / dropped) • Measurements reported periodically • Flow aggregation (indirect) • Flow: sequence of packets with common field in header • Relies on emulation of routing protocol
Sampling-based Approaches • Active end-to-end probes (direct) • Hosts send probe packets to one or more other hosts • Packet loss rate • Round-trip delay • End-to-end path characteristics • Variation: collect and exchange measurements of multicast session
Related Work • Measure end-to-end performance of individual flows • ATM cells sampled at ingress and egress points • Determine QoS for a single connection, e.g., delay and loss rate
Extensions and Other Applications • Distributed denial of service attacks • Attackers use packet spoofing • Filtering • A configurable packet filter may allow trajectory sampling for a subset of packets • Probe Packets • Packet content may be constructed to ensure sampling
Conclusions • Simple processing • No Router state required • Packets directly observed