260 likes | 380 Views
The Impact of False Sharing on Shared Congestion Management. Srinivasa Aditya Akella Joint work with Srini Seshan and Hari Balakrishnan 28 Feb, 2001. Introduction. Predominant model for congestion control Slow-start AIMD Not always optimal
E N D
The Impact of False Sharing on Shared Congestion Management Srinivasa Aditya Akella Joint work with Srini Seshan and Hari Balakrishnan 28 Feb, 2001
Introduction • Predominant model for congestion control • Slow-start • AIMD • Not always optimal • Multiple concurrent flows from Src to Dest may share a bottleneck • Compete for resources rather than co-operate • Especially visible in the context of Web transfers
Sharing Congestion Information... • Solution - share congestion information • Granularity of sharing • Common destination host (network interface) • All destination hosts on the same IP subnet • Set of flows sharing congestion info - macroflow
False Sharing • Flows sharing congestion state might not share the same bottleneck • Sender has no knowledge • False sharing in the Internet • Flows are treated differently- Service Differentiation • Flows take different paths - Path Diversity
False Sharing • Service Differentiation • Integrated Services • Differentiated Services (DiffServ) • Path Diversity • Network Load Balancers • Network Address translators (NATs)
Questions... • Impact on performance and correctness • Compromise to end-to-end congestion control? • Degradation in performance of individual flows? • Detection • Under what conditions can false-sharing be detected? • Response • How should congestion sharing systems be modified? • What effect do these modifications have? • What should be the default behavior?
Quantifying the Penalty XXX needs to be fixed • Analysis • False sharing reduces observed flow throughput • l_share = l_1 l_2 / (l_1 + l_2) • False sharing increases observed flow loss rate • r_noshare = sqrt(r_1 r_2) • r_share = (r_1 + r_2)/2
Service Differentiation • Network treats different flows differently • Bandwidth allocation and buffer resources • IETF DiffServ architecture • Three PHBs : Assured Forwarding, Expedited Forwarding, Best Effort • Nortel's implementation of Diffserv • Experiments with two traffic classes : AF and BE • WRR for bandwidth sharing • RIO (for AF) and RED (for BE) for buffer management • Styles of buffer management • Shared and unshared
Results... • Predicted throughput = XXX need to fill • The faster connection is slowed down by the slower one • Slower connection is never persistently overloaded • Loss rate for the slower connection does not increase appreciably with sharing
Path Diversity • Two flows taking different routes may not share a bottleneck • Two scenarios where path diversity leads to false sharing • Dispersity Routing • NATs • Three distinct categories • Unshared bottleneck • No shared bottleneck link • Semi-shared bottleneck • One of the unshared paths has a bottleneck • Fully shared bottleneck • No bottlenecks in the unshared portions • RTTs would be different
Results for Unshared-Bottleneck • Bandwidth is close to the prediction • Loss rates followed similar pattern as with the DiffServ case
Delays and Losses... • Delays vary independently of each other • Losses are uncorrelated • Variations and delays in losses in one flow are more correlated than those across flows
Fully Shared Bottleneck - How is it Different? • Variations in delay seem correlated • The two flows share a common point of congestion • The flows should not share congection information
Detection • Test description • Rubenstein's Delay and Loss Correlation tests • Need modifications to be a part of the architecture • Flows might undergo false-sharing if even one of their bottlenecks is unshared • Two differentially served flows might observe statistically dependent delays • Scheduler at the sender might apportion bandwidths non-uniformly • Congestion control schemes depend on RTTs • Aggregating flows with different RTTs would lead to false sharing
Loss-correlation Test • Idea -- Losses are likely to come in bursts • This should hold across flows from the same source when a bottleneck is shared • Rubenstein's tests compare the auto and cross correlation metrics for pairs of flows • Does not detect unshared bottlenecks • Need a test to detect all if all bottlenecks are shared • New test - Symmetric Loss Correlation • Loss and cross correlation metrics defined in a manner independent of the flows solves the problem • However, packets across flows are assumed to be spaced closer than those within a flow -- Not always true • A fix -- Schedule transmmissions appropriately
Delay-correlation Test • Delay = f(propagation time, queueing delay) • Queueing delay (Q)can vary significantly with time • Current Q is strongly related to recently values • Challanges with measuring delay • Clocks cannot be easily synchronized • Use change in delay or the relative delay • Methodology of the tests • Use timestamps to compute delays • Compute correlations • Correlation is independent of constant differences
Out-of-Order Test • Flows might have fundamentally different delays • DelayCorr does not identify this • Loss and Delay tests might help detect false-sharing • MultiPath Routing where bottleneck is shared • Out-of-Order test handles this well • Look at packet reordering from a source • Reordering by more than 3 packets => No sharing • Limitation: Packets must be delivered to the same physical destination • Cannot be applied to situations like NAT • Rely on RTTs in such situations
Evaluation of the Tests • Two metrics for each tests • Detection time • Probability of correct decision • Which test is the best? • Out-of-order tests are mostly accurate • Loss tests are neither timely nor accurate • Delay tests are timely but not as accurate • Symmetric Loss test ouputs correct result much more often than the asymmetric test
Response to False Sharing • Design Issues • Default behavior: share information and detect false-sharing • Scheduling • False sharing detected more easily than genuine sharing • Default of no-sharing makes no sense with out-of-order tests • Upon detection, stop sharing • In CM, associate the different flows to different macroflows • Relatively small confidence intervals can be used • No significant penalty due to an incorrect decision
Performance • How good can restoration possibly be? • False sharing may penalize flows significantly • It might take time to restore performance • However, the greater the penalty, the easier it is to detect • Approach to performance evaluation -- multiple, de-randomized, offline runs • Performance restored in less then a factor of 3 of time taken to detect