120 likes | 572 Views
Network Tomography. Venkat Padmanabhan Lili Qiu MSR Tab Meeting 22 Oct 2001. Overview. Goal: discover characteristics of internal links in network using passive, end-to-end measurements Metrics: loss rate , bandwidth Why is this interesting? finding trouble spots in the network
E N D
Network Tomography Venkat Padmanabhan Lili Qiu MSR Tab Meeting 22 Oct 2001
Overview • Goal: discover characteristics of internal links in network using passive, end-to-end measurements • Metrics: loss rate, bandwidth • Why is this interesting? • finding trouble spots in the network • e.g., AT&T-Sprint peering point could be congested • a Web site operator can keep tabs on his/her ISP and decide whether to sign up with new ISP(s) • deciding where to place server replicas • downstream of major trouble spots
microsoft.com Why is it so slow? AT&T Sprint C&W UUNET Earthlink Darn, it’s slow! AOL Qwest
Topological Metrics Topological metrics are poor predictors of packet loss rate All links are not equal need to identify the bad links
S A B Prior Work • Active probing to infer link loss rate • multicast probes • striped unicast probes • Pros & cons • accurate since individual loss events identified • expensive because of extra probe traffic S A B
Our Approach • Passive observation of existing traffic • measure loss rate rather than loss events • Active probing to discover network topology • can be done infrequently and in the background server l1 (1-l1)*(1-l2)*(1-l4) = (1-p1) (1-l1)*(1-l2)*(1-l5) = (1-p2) … (1-l1)*(1-l3)*(1-l8) = (1-p5) Under-constrained system of equations l3 l2 l4 l5 l6 l7 l8 clients p1 p2 p3 p4 p5
#1: Random Sampling • Randomly sample the solution space • Repeat this several times • Draw conclusions based on overall statistics • How to do random sampling? • determine loss rate bound for each link using best downstream client • iterate over all links: • pick loss rate at random within bounds • update bounds for other links • Problem: little tolerance for estimation error server l1 l3 l2 l4 l5 l6 l7 l8 p1 p2 p3 p4 p5 clients
#2: Linear Optimization Goals • Parsimonious explanation • Robust to estimation error Li = log(1/(1-li)), Pj = log(1/(1-pj)) minimize Li + |Sj| L1+L2+L4 + S1 = P1 L1+L2+L5 + S2 = P2 … L1+L3+L8 + S5 = P5 Li >= 0 Can be turned into a linear program server l1 l3 l2 l4 l5 l6 l7 l8 p1 p2 p3 p4 p5 clients
Results • Experimental setup • packet tracing machine at microsoft.com • client loss rates estimated from TCP traffic • trace analyzed: 2.12 hours, 100 million packets, 134475 clients • Validation • likely candidates for lossy links: • links that cross an inter-AS boundary • links that have a large delay
Random Sampling Linear Optimization • Of the 50 links identified as most lossy, 42-45 cross an inter-AS boundary and/or have delay > 100 ms • Example lossy links found: • San Francisco (AT&T) Indonesia (Indo.net) • Sprint PacBell in California • Moscow Tyumen, Siberia (Sovam Teleport)
Simulation Experiments • Advantage: no uncertainty about link loss rate! • Methodology • topologies used: • randomly-generated: 1000 nodes, max degree = 5-50 • real topology obtained by tracing paths to microsoft.com clients • randomly-generated packet loss events at each link • loss rate 0-1% for 95% of links (non-lossy links) , 5-10% for 5% of links (lossy links) • Goodness metric: % links classified correctly • randomly-generated topologies: 90-94% accurate • lossy links alone: 85-95% found, but 30-90% false +ve • real topology: 85-90% accurate
Ongoing and Future Work • Large scale simulations with realistic topologies and traffic patterns • Better validation in the Internet setting • correlation with packet loss rate for new clients • active measurements in real time • Measurement from multiple sites(e.g., replicas) • Other protocols and metrics • non-TCP traffic (e.g., streaming media); link bandwidth • Refinement of techniques • “pseudo-passive” probing • selective active probing