Two Samples are Enough: Opportunistic Flow-level Latency Estimation using NetFlow

Two Samples are Enough:Opportunistic Flow-level Latency Estimation using NetFlow Myungjin Lee†, Nick Duffield‡, Ramana Rao Kompella† †Purdue University, ‡AT&T Labs–Research

Per-hop Measurements are important AS 1 100 ms R2 IPTV/VoIP/VoD Server R1 R3 AS 2 Which router causes the problem?? Why 100 ms?!

Aggregate vs. Per-Flow AS 1 10 ms 5 ms R2 IPTV/VoIP/VoD Server R1 R3 flow-level latency measurements on a per-hop basis AS 2 Aggregate latencies look all right. Why? Why 100 ms?!

Existing Approaches • Active probes and tomography • Chen et al. [SIGCOMM’04], Duffield et al. [IMC’03] • Problems • Problem formulation is under-constrained • No per-flow latency measurements • Lossy Difference Aggregator (LDA) • Kompella et al. [SIGCOMM’09] • Problems • Require hardware modification • No per-flow latency measurements

Basic Framework: NetFlow • A measurement framework widely deployed in routers • Maintains per-flow state in the form of a flow record • Packet and byte counts • Flow duration (flow start and end timestamps) • Usage • Normally used for accounting, traffic matrix estimation, etc. • Does not support per-flow latency measurements • Goal: Enable per-flow latency estimation • Harness flow start and end timestamps in NetFlow framework

Obtaining Two Delay Samples AS 1 R2 R2 IPTV/VoIP/VoD Server R1 R1 R3 A 2 1 B AS 2 − = Two delay samples / flow − =

Problem 1: Independent Packet Sampling Only update 1st packet R2 R1 A No coordination between NetFlow instances 2 1 B Only update 2nd packet

Solution: Hash-based Sampling Hash Space 1 2 R2 Sampling Space R1 A Hash-based sampling achieves coordination 2 1 B Sampled at both NetFlow Instances Not sampled at both NetFlow Instances

Problem 2: Packet Loss Update both packets R2 R1 A Packet losses may cause inconsistencies 2 1 B X Only update 1st packet

Solution: Packet Digests Detect unusable timestamp R2 R1 A Packet digest achieves packet association 2 1 B Detect unusable timestamp X

Consistent NetFlow (CNF) Architecture • Issue I: No coordination between NetFlow instances • Different packets are sampled on different NetFlow instances • Solution: Hash-based sampling (filtering) in RFC 5475 • Same packets are selected on different NetFlow instances • IETF PSAMP working group • Issue II: Packet losses • Discrepancy in selected packets due to packet losses • Solution: Maintaining packet digests • Hash of the invariant packet contents • Use timestamps iff packet digests match at the two routers

Trivial Estimator: Endpoint • Use two delay samples belonging to the same flow • Obtain accurate latency estimates for small flows • Problem: Accuracy penalty for large flows • Solution: Multiflow estimator Flow ID Avg ( , ) = Time Endpoint estimator

Better Estimator: Multiflow • Key insight: Packets experiencing same queuing busy periods will experience similar delays • Use background delay samples from other flows • Use only delay samples between the start and end of a flow Flow ID Time Avg ( , , , ) = Multiflow estimator

Evaluation • Simulation Setting • Endpoint vs. Multiflow estimators • Comparison with Trajectory sampling

Simulation Setting • Modified YAF • Simulate a queuing model with RED active queue management policy • Dataset • CHIC trace • 1 min. trace collected from an OC-192 backbone link • About 13M packets and 1M flows

Multiflow vs. Endpoint Estimators Endpointobtains good accuracy Multiflow performs better than Endpoint

Trajectory Sampling • Shares some similarity with CNF architecture • Routers use consistent hash function to sample packets • Facilitates direct observation of packet trajectories • Requires flow ID and timestamps for per-flow latency estimation • Aggregate all sampled packets with same flow key • Compute their average latency

Comparison with Trajectory Sampling Multiflow is 2-3x better than Trajectory Packet sampling rate = 0.01

Summary • Our approach retrofits per-flow latency estimates in the NetFlow framework • Two main ideas • Consistent NetFlow architecture ensures that different routers record the same set of flows • Multiflow estimator achieves significantly accurate estimates of per-flow latencies compared to prior approach

Questions?

Two Samples are Enough: Opportunistic Flow-level Latency Estimation using NetFlow