330 likes | 418 Views
Sampling and Stability in TCP/IP Workloads. Lisa Hsu, Ali Saidi, Nathan Binkert Prof. Steven Reinhardt University of Michigan. Background. During networking experiments, some runs would inexplicably get no bandwidth Searched high and low for what was “wrong” Simulator bug? Benchmark bug?
E N D
Sampling and Stability in TCP/IP Workloads Lisa Hsu, Ali Saidi, Nathan Binkert Prof. Steven Reinhardt University of Michigan MoBS 2005
Background • During networking experiments, some runs would inexplicably get no bandwidth • Searched high and low for what was “wrong” • Simulator bug? • Benchmark bug? • OS bug? • Answer: none of the above MoBS 2005
The Real Answer • Simulation Methodology!? • Tension between speed and accuracy in simulation • Want to capture representative portions of simulation WITHOUT running the entire application • Solution: Fast functional simulation • So what’s the problem here? MoBS 2005
TCP Tuning • TCP tunes itself to the performance of underlying system • Sets its send rate based on perceived end-to-end bandwidth • Performance of network • Performance of receiver • During checkpointing simulation, had tuned to performance of meaningless system • After switching to detailed simulation, the dramatic change in underlying system performance disrupted flow MoBS 2005
Timing Dependence • The degree to which an application’s performance depends upon execution timing (e.g. memory latencies) • Three classes: • Non-timing dependent (like SPEC2000) • Weakly timing dependent (like multithreaded) • Strongly timing dependent MoBS 2005
Packet from application Perceived bandwidth high send it now! Peceived bandwidth low wait til later Execution Path Strongly Timing Dependent Application execution depends on stored feedback state from underlying system (like TCP/IP workloads) MoBS 2005
Packet from application Perceived bandwidth high send it now! MEANINGLESS Peceived bandwidth low wait til later Execution Path Detailed Simulation Correctness Issue Functional Simulation MoBS 2005
Packet from application Perceived bandwidth high send it now! Peceived bandwidth low wait til later Need to…. Perceived bandwidth reflects that of configuration under test Safe to take Data!! MoBS 2005
Goals • More rigorous characterization of this phenomenon • Determine severity of this tuning problem across a variety of networking workloads • Network link latency sensitivity? • Benchmark type sensitivity? • Functional CPU performance sensitivity? MoBS 2005
M5 Simulator • Network targeted full system simulator • Real NIC model • National Semiconductor DP83820 GigE Ethernet Controller • Boots Linux 2.6 • Uses Linux 2.6 driver for DP83820 • All systems (and link) modeled in a single process • Synchronization between systems managed by a global tick frequency MoBS 2005
or 8 IPC 1 or 8 IPC 1 Cycle Mem FASTEST or 8 IPC 1 or 8 IPC 1 Cycle Mem FASTER 1 IPC + Blocking Caches << 1 IPC OoO Superscalar Non-Blocking Caches SLOWEST FASTER Operating Modes MoBS 2005
Benchmarks • 2 system client/server configuration • Netperf • Stream – a transmit microbenchmark • Maerts – a receive microbenchmark • SPECWeb99 • NAT configuration (3 system config) • Netperf maerts with a NAT gateway between client and server MoBS 2005
System Under Test Drive System link cache D PF1/PF8 FC1 PF8 CACHE WARMUP MEASUREMENT CHECKPOINTING Experimental Configuration (x2 if NAT) (receiver/sender) (sender/NAT/receiver) MoBS 2005
“Graph Theory” • Tuning periods after CPU model changes? • How long do they last? • Which graph minimizes Detailed modeling time necessary? • Effects of checkpointing PF width? MoBS 2005
COV .5% COV 1.66% FC Cache warmup endstransition to D PF checkpoints loadedtransition to D or FC Netperf Maerts No tuning! Tuning period Tuningperiod bears brunt of tuning time • Takeaways: • Shift from “high performance” CPU to lower causes more drastic tuning periods • Shift from lower performance to higher has more gentle transition Known achievable bandwidth by each system configuration MoBS 2005
Netperf Stream • Why no tuning periods? • Because it is SENDER limited! • Change in performance is local – no feedback from network or receiver required • Thus changes in send rate can be immediate MoBS 2005
NAT = System Under Test sender receiver CPU changes applied here NAT Netperf Maerts The “pipe” is changing – this feedback takes longer to receive in TCP because it is not explicit may ruin simulation MoBS 2005
TCP Kernel Parameters Solved in real world by TCP timeouts, but would take much too long to simulate pouts– unACKed packets in flight cwnds– congestion window (in packets) **Reflects state of the network pipe sndwnds– available receiver buffer space (in bytes) **Reflects receiver’s ability to receive Deadlock? TCP RULES: pouts may NOT exceed cwnds bytes(pouts) may NOT exceed sndwnds MoBS 2005
SPECWeb99 • Much more complex than Netperf • Harder to understand fundamental interactions • Speculations in paper – but understanding this more deeply definitely future work MoBS 2005
What About Link Delay? • TCP algorithm: cwnd can only increase upon every receipt of an ACK packet • Ramp-up of cwnd is limited by RTT • KEY POINT: tuning time is sensitive to RTT MoBS 2005
Conclusions • TCP/IP workloads require a tuning period relative to the network RTT when receiver limited • Sender-limited workloads are generally not problematic • Some cases lead to unstable system behavior • Tips for minimizing tuning time: • “Slow” fast forwarding CPU • Try different switchover points • Use fast-ish cache warmup period to bear brunt of transition MoBS 2005
Future Work • Identify other strongly timing dependent workloads (feedback directed optimization?) • Examine SPECWeb behavior further • Further investigate protocol interactions that cause zero bandwidth periods • Hopefully lead to more rigorous avoidance method MoBS 2005
Questions? MoBS 2005
memory access MISS L1 Perfect Cache HIT Execution Path Non-Timing Dependent Single-threaded, application only execution (like SPEC2000) MoBS 2005
L1 Missidle loop memory access Perfect Cachecontinue Execution Path RAM accessschedule different thread Weakly Timing Dependent Application execution tied to OS decisions (like multi-threaded apps) MoBS 2005
Basic TCP Overview • Congestion Control Algorithm • Match send rate to the network’s ability to receive it • Flow Control Algorithm • Match send rate to the receiver’s ability to receive it • Overall goal: • Send data as fast as possible without overwhelming system, which would effectively cause slowdown MoBS 2005
Congestion Control • Feedback in the form of • Time Outs • Duplicate ACKs • Feedback dictates Congestion Window parameter • Limits the number of unACKed packets out at a given time (i.e. send rate) MoBS 2005
Congestion Control cont. • Slow Start • Congestion window starts at 1, every ACK received is an exponential increase in congestion window • Additive Increase, Multiplicative Decrease (AIMD) • Every ACK increases window by 1, losses perceived by DupACK halve the window • Timeout recovery • Upon timeout, go back to slow start MoBS 2005
Flow Control • Feedback in the form of explicit TCP header notifications • Receiver tells sender how much kernel buffer space it has available • Feedback dictates send window parameter • Limits the amount of unACKed data out at any given time MoBS 2005
Results • Zero Link Delay MoBS 2005
Non Timing Dependent • Single threaded, application only simulation (like SPEC2000) • The execution timing does not affect the commit order of instructions • Architectural state generated by a fast functional simulator would be the same as a detailed simulator MoBS 2005
Weakly Timing Dependent • Applications whose performance are tied with OS decisions • Multi-threaded (CMP, SMT, etc.) • Execution timing effects like cache hits and misses, memory latencies, etc. can affect scheduling decisions • However, these execution path variations are all valid and do not pose a correctness problem MoBS 2005
Strongly Timing Dependent • Workloads that explicitly tune themselves to performance of underlying system • Tuning to an artificially fast system affects system performance • When switching to detailed simulation, you may get meaningless results MoBS 2005