Network Tomography

Network Tomography Venkat Padmanabhan Lili Qiu MSR Tab Meeting 22 Oct 2001

Overview • Goal: discover characteristics of internal links in network using passive, end-to-end measurements • Metrics: loss rate, bandwidth • Why is this interesting? • finding trouble spots in the network • e.g., AT&T-Sprint peering point could be congested • a Web site operator can keep tabs on his/her ISP and decide whether to sign up with new ISP(s) • deciding where to place server replicas • downstream of major trouble spots

microsoft.com Why is it so slow? AT&T Sprint C&W UUNET Earthlink Darn, it’s slow! AOL Qwest

Topological Metrics Topological metrics are poor predictors of packet loss rate All links are not equal  need to identify the bad links

S A B Prior Work • Active probing to infer link loss rate • multicast probes • striped unicast probes • Pros & cons • accurate since individual loss events identified • expensive because of extra probe traffic S A B

Our Approach • Passive observation of existing traffic • measure loss rate rather than loss events • Active probing to discover network topology • can be done infrequently and in the background server l1 (1-l1)*(1-l2)*(1-l4) = (1-p1) (1-l1)*(1-l2)*(1-l5) = (1-p2) … (1-l1)*(1-l3)*(1-l8) = (1-p5) Under-constrained system of equations l3 l2 l4 l5 l6 l7 l8 clients p1 p2 p3 p4 p5

#1: Random Sampling • Randomly sample the solution space • Repeat this several times • Draw conclusions based on overall statistics • How to do random sampling? • determine loss rate bound for each link using best downstream client • iterate over all links: • pick loss rate at random within bounds • update bounds for other links • Problem: little tolerance for estimation error server l1 l3 l2 l4 l5 l6 l7 l8 p1 p2 p3 p4 p5 clients

#2: Linear Optimization Goals • Parsimonious explanation • Robust to estimation error Li = log(1/(1-li)), Pj = log(1/(1-pj)) minimize Li + |Sj| L1+L2+L4 + S1 = P1 L1+L2+L5 + S2 = P2 … L1+L3+L8 + S5 = P5 Li >= 0 Can be turned into a linear program server l1 l3 l2 l4 l5 l6 l7 l8 p1 p2 p3 p4 p5 clients

Results • Experimental setup • packet tracing machine at microsoft.com • client loss rates estimated from TCP traffic • trace analyzed: 2.12 hours, 100 million packets, 134475 clients • Validation • likely candidates for lossy links: • links that cross an inter-AS boundary • links that have a large delay

Random Sampling Linear Optimization • Of the 50 links identified as most lossy, 42-45 cross an inter-AS boundary and/or have delay > 100 ms • Example lossy links found: • San Francisco (AT&T)  Indonesia (Indo.net) • Sprint  PacBell in California • Moscow  Tyumen, Siberia (Sovam Teleport)

Simulation Experiments • Advantage: no uncertainty about link loss rate! • Methodology • topologies used: • randomly-generated: 1000 nodes, max degree = 5-50 • real topology obtained by tracing paths to microsoft.com clients • randomly-generated packet loss events at each link • loss rate 0-1% for 95% of links (non-lossy links) , 5-10% for 5% of links (lossy links) • Goodness metric: % links classified correctly • randomly-generated topologies: 90-94% accurate • lossy links alone: 85-95% found, but 30-90% false +ve • real topology: 85-90% accurate

Ongoing and Future Work • Large scale simulations with realistic topologies and traffic patterns • Better validation in the Internet setting • correlation with packet loss rate for new clients • active measurements in real time • Measurement from multiple sites(e.g., replicas) • Other protocols and metrics • non-TCP traffic (e.g., streaming media); link bandwidth • Refinement of techniques • “pseudo-passive” probing • selective active probing

Network Tomography

Network Tomography

Presentation Transcript

Tomography

Time Series from their Observed Sums: Network Tomography

Tomography

Network Tomography Based on Flow Level Measurements

Tomography

Tomography-based Overlay Network Monitoring

Tomography-based Overlay Network Monitoring

IPS tomography IPS-MHD tomography

Network Tomography on Correlated Links

Tomography-based Overlay Network Monitoring

Network Tomography

Network Coding Tomography for Network Failures

Simple Network Performance Tomography

Robust Network Tomography in the Presence of Failures

Network Tomography for the Internet: Open Problems

Passive Network Tomography Using Bayesian Inference

Network Tomography through End-End Multicast Measurements

Network Tomography based Unresponsive Flow Detection and Control

Multiterminal Network Tomography

Tomography-based Overlay Network Monitoring

Tomography

Network Tomography Based on Flow Level Measurements