290 likes | 346 Views
User-level Internet Path Diagnosis. Ratul Mahajan, Neil Spring, David Wetherall and Thomas Anderson. Designed by Yao Zhao. A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. L. Lamport. Motivation.
E N D
User-level Internet Path Diagnosis Ratul Mahajan, Neil Spring, David Wetherall and Thomas Anderson Designed by Yao Zhao
A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. L. Lamport
Motivation • Can end users, with no special privileges identify and pinpoint faults inside the network that degrade the performance of their applications? • Why (unprivileged) end users? • Operators do not share the users’ view of the network • Operators may have no more insight than unprivileged users for problems inside other administrative domains • user can directly contact the responsible ISP leading to faster problem resolution • Many techniques are more effective and scalable with fault localization than blindly trying all possibilities
Outline • Diagnosis architecture • Diagnosis Tool: Tulip • Evaluation • Recommendations • Conclusion
An Ideal Trace-based Solution • Routers log packet activity and make these traces available to users. • The log at each router is recorded for both input and output interfaces. • impractical for deployment
Packet-based Solutions • Complete Embedding • Each router along the path records information into each packet that it forwards. • Barring two exceptions, the scheme above is equivalent to the path trace. • Reduced Embedding • Remove the step of embedding the complete input packet in the output packet • Constant Space Embedding • Sample TTL • Real Clocks • Unsynchronized clock • Finite precision
Outline • Diagnosis architecture • Diagnosis Tool: Tulip • Evaluation • Recommendations • Conclusion
Internet Approximations • Out-of-band measurement probes • ICMP timestamp requests to access time at the router • IP identifiers instead of per-flow counters
Assumptions for Packet Loss • IP-IDs are consecutive • 80% of the time from over 90% of the routers • Small size packets usually have low loss rate • In over 60% of the cases when any packet in the triplet was lost, only the data packet was lost. • ICMP rate-limiting will not be mistaken as packet loss • 1 more check packet
Packet Queuing • Similar to cing • Two practical problems: • ICMP generation time • Cable modems and wireless links
Tulip • Network Load • BL/W • Diagnosis time • 10 ~ 30 min per path • Parallel search vs Binary search • Two or more faults?
Outline • Diagnosis architecture • Diagnosis Tool: Tulip • Evaluation • Recommendations • Conclusion
Methodology • Evaluate applicability • Diagnosis granularity • Three sources: MIT, U Washington and London • Destinations from Skitter • Validation
Validation • IP-IDs and ICMP timestamp vs End-to-end measurement • Tulip vs Sting • Consistency of Tulip’s inferences • Consistency between Tulip and Paths
Two facts • Locating Loss and Delay in the Internet • Persistence of Faults
Outline • Diagnosis architecture • Diagnosis Tool: Tulip • Evaluation • Recommendations • Conclusion
Limitations of Tulip • Out-of-band measurements • Stable routing path • IP-ID counters • Limitations of ICMP timestamps
In-band vs Out-of-band Diagnosis • Priority of protocols • Packet drop • Packet size • Loss rate • Reordering
Other Recommendations • Path Verification • IP Identifiers • Router Timestamps
Related Works • Diagnosis Approaches • Magpie • SPIE • NetFlow • Measurement Primitives • Overlay primitives • IPMP • Measurement Tools • PING, Traceroute, pathchar, Sting
Conclusion • Tulip • Practical tool to diagnose packet reordering, loss and queuing • Diagnosis architecture • In-band • Lightweight