Tomography-based Overlay Network Monitoring and its Applications

Tomography-based Overlay Network Monitoring and its Applications Yan Chen Joint work with David Bindel, Brian Chavez, Hanhee Song, and Randy H. Katz UC Berkeley

Motivation • Applications of end-to-end distance monitoring • Overlay routing/location • Peer-to-peer systems • VPN management/provisioning • Service redirection/placement • Cache-infrastructure configuration • Requirements for E2E monitoring system • Scalable & efficient: small amount of probing traffic • Accurate: capture congestion/failures • Incrementally deployable • Easy to use

Existing Work • Static estimation: • Global Network Positioning (GNP) • Dynamic monitoring • Loss rates: RON (n2measurement) • Latency: IDMaps, Dynamic Distance Maps, Isobar • Latency similarity under normal conditions doesn’t imply similar losses ! • Network tomography • Focusing on inferring the characteristics of physical links rather than E2E paths • Limited measurements -> under-constrained system, unidentifiable links

Overlay Network Operation Center End hosts topology measurements Problem Formulation Given n end hosts on an overlay network and O(n2) paths, how to select a minimal subset of paths to monitor so that the loss rates/latency of all other paths can be inferred. • Key idea: select a basis set of k paths that completely describe all O(n2) paths (k «O(n2)) • Select and monitor k linearly independent paths to compute the loss rates of basis set • Infer the loss rates of all other paths • Applicable for any additive metrics, like latency

A B p lj • Path matrix G • Put all r = O(n2) paths together • Path loss rate vector b Path Matrix and Path Space • Path loss rate p, link loss rate l • Totally s links, path vector v

link 2 A b2 (1,1,0) 1 3 b1 (1,-1,0) row space (measured) D null space (unmeasured) b3 C 2 link 1 B link 3 Sample Path Matrix • x1 - x2unknown => cannot compute x1, x2 • Set of vectors form null space • To separate identifiable vs. unidentifiable components: x = xG + xN • All E2E paths are in path space, i.e., GxN = 0

Intuition through Topology Virtualization • Virtual links: minimal path segments whose loss rates uniquely identified • Can fully describe all paths • xG : similar forms asvirtual links 1’ 1 2 1 Rank(G)=1 1 Rank(G)=2 1’ 2’ 2 1 2 3 2’ 1’ 1 1 3’ 2 2 4 3 3 Rank(G)=3 4’ Virtualization Real links (solid) and overlay paths (dotted) going through them Virtual links 5

Algorithms • Select k = rank(G) linearly independent paths to monitor • Use rank revealing decomposition, e.g., QR with column pivoting • Leverage sparse matrix: time O(rk2) and memory O(k2) • E.g., 10 minutes for n = 350 (r = 61075) and k = 2958 • Compute the loss rates of other paths • Time O(k2) and memory O(k2)

How much measurement saved ? • k « O(n2) ? • For power-law Internet topology, M nodes, N end hosts • There are O(M) links and N > M/2 (with proof) • If n = O(N), overlay network has O(n) IP links, k = O(n)

BRITE 20K-node hierarchical topology Mercator 284K-node real router topology When a Small Portion of End Hosts on Overlay • Internet has moderate hierarchical structure [TGJ+02] • If a pure hierarchical structure (tree): k = O(n) • If no hierarchy at all (worst case, clique): k = O(n2) • Internet should fall in between … For reasonably large n, (e.g., 100), k = O(nlogn)

Practical Issues • Topology measurement errors tolerance • Care about path loss rates than any interior links • Poor router alias resolution present show multiple links for one => assign similar loss rates to all the links • Unidentifiable routers => ignore them as virtualization • Topology changes • Add/remove/change one path incurs O(k2) time • Topology relatively stable in order of a day => incremental detection

Evaluation • Simulation • Topology • BRITE: Barabasi-Albert, Waxman, hierarchical: 1K – 20K nodes • Real router topology from Mercator: 284K nodes • Fraction of end hosts on the overlay: 1 - 10% • Loss rate distribution (90% links are good) • Good link: 0-1% loss rate; bad link: 5-10% loss rates • Good link: 0-1% loss rate; bad link: 1-100% loss rates • Loss model: • Bernouli: independent drop of packet • Gilbert: busty drop of packet • Path loss rate simulated via transmission of 10K pkts • Metric: path loss rate estimation accuracy • Absolute/relative errors • Lossy path inference

Experiments on Planet Lab • 51 hosts, each from different organizations • 51 × 50 = 2,550 paths • Simultaneous loss rate measurement • 300 trials, 300 msec each • In each trial, send a 40-byte UDP pkt to every other host • Simultaneous topology measurement • Traceroute • Experiments: 6/24 – 6/27 • 100 experiments in peak hours

Tomography-based Overlay Monitoring Results • Loss rate distribution • Accuracy • On average k = 872 out of 2550 • Absolute error |p – p’|: • Average 0.0027 for all paths, 0.0058 for lossy paths • Relative error

Absolute and Relative Errors • For each experiment, get its 95 percentile absolute and relative errors for estimation of 2,550 paths

Lossy Path Inference Accuracy • 90 out of 100 runs have coverage over 85% and false positive less than 10% • Many caused by the 5% threshold boundary effects

Topology Measurement Error Tolerance • Out of 13 sets of pair-wise traceroute … • On average 248 out of 2550 paths have no or incomplete routing information • No router aliases resolved Conclusion: robust against topology measurement errors

Performance Improvement with Overlay • With single-node relay • Loss rate improvement • Among 10,980 lossy paths: • 5,705 paths (52.0%) have loss rate reduced by 0.05 or more • 3,084 paths (28.1%) change from lossy to non-lossy • Throughput improvement • Estimated with • 60,320 paths (24%) with non-zero loss rate, throughput computable • Among them, 32,939 (54.6%) paths have throughput improved, 13,734 (22.8%) paths have throughput doubled or more • Implications: use overlay path to bypass congestion or failures

Adaptive Overlay Streaming Media Stanford UC San Diego UC Berkeley X HP Labs • Implemented with Winamp client and SHOUTcast server • Congestion introduced with a Packet Shaper • Skip-free playback: server buffering and rewinding • Total adaptation time < 4 seconds

Adaptive Streaming Media Architecture

Conclusions • A tomography-based overlay network monitoring system • Given n end hosts, characterize O(n2) paths with a basis set of O(nlogn) paths • Selectively monitor O(nlogn) paths to compute the loss rates of the basis set, then infer the loss rates of all other paths • Both simulation and real Internet experiments promising • Built adaptive overlay streaming media system on top of monitoring services • Bypass congestion/failures for smooth playback within seconds

Backup Slides

Tomography-based Overlay Network Monitoring and its Applications

Tomography-based Overlay Network Monitoring and its Applications

Presentation Transcript

Network Tomography

Network Tomography Based on Flow Level Measurements

Network Processor and Its Applications

Network Processor and Its Applications (1)

Tomography-based Overlay Network Monitoring

The MRNet Tree-based Overlay Network

Tomography-based Overlay Network Monitoring

Tomography-based Overlay Network Monitoring

Network Tomography

Social Network Analysis and Its Applications

Tree-based Overlay Networks for Scalable Applications and Analysis

Applications : Network Monitoring

Global seismic tomography and its CIDER applications

Decomposing Overlay Applications

Tomography-based Overlay Network Monitoring and its Applications

Trajectory Based Forwarding and its Applications

Tomography-based Overlay Network Monitoring Yan Chen, David Bindel , Randy H. Katz

Bayesian network and its applications

Network Tomography based Unresponsive Flow Detection and Control

Bayesian network and its applications

Tomography-based Overlay Network Monitoring

Network Tomography Based on Flow Level Measurements