210 likes | 339 Views
Detecting Shared Congestion of Flows Via End-to-end Measurement (and other inference problems). Dan Rubenstein joint work with Jim Kurose and Don Towsley Umass Amherst. NETWORK. Network Inference. What’s going on in there?. Where are packets getting lost / delayed?
E N D
Detecting Shared Congestion of Flows Via End-to-end Measurement(and other inference problems) Dan Rubenstein joint work with Jim Kurose and Don Towsley Umass Amherst
NETWORK Network Inference • What’s going on in there? • Where are packets getting lost / delayed? • Where is congestion occurring? • Where are the network hot spots? • What are routers doing (WFQ, RED)? • What version of TCP are end-hosts using?
Multiple Autonomous Systems somebody else! • What routing capabilities does your ISP provide? “That’s proprietary info” • Who’s to blame for poor service? • Consequence: who has to figure out what and where the problem is and how to fix it?
Overview • Overview of other inference work: • Identifying bottleneck capacities • Multicast inference of loss (MINC) • TCP inference (TBIT) • Detecting shared points of congestion
Identifying bottleneck bandwidths • Links have different capacities • “skinniest” link processes slowest: creates a rate bottleneck • can the bottleneck rate be identified? • Lots of work here [Carter’96, Jacobson’97, Downey’99, Lai’99, Melander’99, Lai’00]
S Pts of loss R R R R R Multicast Inference • Infer loss points on multicast tree via correlation patterns of receivers w/in a multicast group [Ratnas’99, Caceres’99 (3), LoPresti’99, Adler’00]
TCP Inference (TBIT) • Many versions of TCP exist • RENO, TAHOE, VEGAS • Many “optional” components • SACK, ECN compliance • Are specification reqmts being met? • initial window sizes, slow start • TBIT: TCP Behavior Identification Tool [Padhye’00] • stress-tests a server’s TCP by intentionally delaying / dropping various ACKs • different TCPs / TCP options respond differently to the delayed / dropped ACKs
Server Point of congestion Point of congestion Client Detecting Shared Pts of Congestion: Why bother? • When flows share common point of congestion (POC), bandwidth can be “transferred” between flows w/o impacting other traffic • Applications: WWW servers, multi-flow (multi-media) sessions, multi-sender multicast • Can limit “transfer” to flows w/ identical e2e data paths [Balak’99] • ensures flows have common bottleneck • but limits applicability
Detecting Shared POCs Q: Can we identify whether two flows share the same Point of Congestion (POC)? Network Assumptions: • routers use FIFO forwarding • The two flows’ POCs are either all shared or all separate
R1 S1 S1 R1 S2 R2 S2 R2 Techniques for detecting shared POCs • Requirement: flows’ senders or receivers are co-located co-located senders co-located receivers • Packet ordering through a potential SPOC same as that at the co-located end-system • Good SPOC candidates
Internet Internet Simple Queueing Models of POCs for two flows Separate POCs A Shared POC FG Flow 1 FG Flow 2 FG Flow 1 FG Flow 2 BG BG BG
Approach (High level) • Idea: Packets passing through same POC close in time experience loss and delay correlations[Moon’98, Yajnik’99] • Using either loss or delay statistics, compute two measures of correlation: • Mc: cross-measure (correlation between flows) • Ma: auto-measure (correlation within a flow) • such that • if Mc < Mathen infer POCs are separate • else Mc > Maand infer POCs are shared
E[XY] - E[X]E[Y] (E[X2] - E2[X])(E[Y2] - E2[Y]) C(X,Y) = The Correlation Statistics... i-4 i-3 Flow 1 pkts Loss-Corr for co-located senders: Mc = Pr(Lost(i) | Lost(i-1)) Ma = Pr(Lost(i) | Lost(prev(i))) Loss-Corr for co-located receivers: a bit more complex i-2 time i-1 Flow 2 pkts i Delay: Either co-located topology: Mc = C(Delay(i), Delay(i-1)) Ma = C(Delay(i), Delay(prev(i)) i+1
Intuition: Why the comparison works • Recall: Pkts closer together exhibit higher correlation • E[Tarr(i-1, i)] < E[Tarr(prev(i), i)] • On avg, i “more correlated” with i-1 than with prev(i) • True for many distributions, e.g., • deterministic, any • poisson, poisson • Rest of talk: assume poisson, poisson Tarr(i-1, i) Tarr(prev(i), i)
Analytical Results As # samples • Loss-Correlation technique: • Assume POC(s) are M+M/M/1/K queues: • Thm: Co-located senders, then Mc > Ma iff flows share POCs • co-located receivers: Mc > Ma iff flows share POCs shown via extensive tests using recursive solutions of Mc and Ma • Delay-Correlation technique: Assume POC(s) are M+G/G/1/ queues • Thm: Both co-located topologies: Mc > Ma iff flows share POCs
Simulation Setup • Co-located senders: Shared POCs on/off sources R1 30ms TCP traffic 20ms 20 pps 30ms S1 10ms 30ms 10ms 1000 Mbs S2 20 pps 20ms 1.5 Mbs R2 20ms
2nd Simulation Setup • Co-located senders: Independent POCs on/off sources TCP traffic R1 30ms 20ms 20pps 30ms S1 10ms 30ms 10ms 1.5 Mbs S2 20pps 20ms 1000 Mbs R2 20ms on/off sources TCP traffic
Simulation results • Delay-corr an order of magnitude faster than loss-corr • The Shared loss-corr dip: bias due to delayed Mcsamples • Similar results on co-located receiver topology simulations Independent POCs Shared POCs
Internet Experiments • Goal: Verify techniques using real Internet traces • Experimental Setup: • Choose topologies where POC status (shared or unshared) • Use traceroute to assess shared links and approximate per-link delays 264 ms UMass 30 ms UCL ACIRI 193 ms Separate POCs (?)
Correct Inconclusive Wrong Sites 3 Umass (MA) Columbia (NY) UCL (UK) AT&T (Calif.) ACIRI (Calif.) Experimental Results
Summary • E2E Shared-POC detecting techniques • Delay-based techniques more accurate, take less time (order of magnitude) • Future Directions: • Experiment with non-Poisson foreground traffic • Focus on making techniques more practical (e.g., Byers @ BU CS for recent TR) • Paper available (SIGMETRICS’00)