180 likes | 400 Views
PinPlay : A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs. Harish Patil, Cristiano Pereira , Mack Stallcup, Gregory Lueck, James Cownie Intel Corporation CGO 2010, Toronto, Canada. Non-Determinism. Program execution is not repeatable across runs
E N D
PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira, Mack Stallcup, Gregory Lueck, James Cownie Intel Corporation CGO 2010, Toronto, Canada
Non-Determinism • Program execution is not repeatable across runs • Interactions with environment (single-threaded) • Shared-memory interleaving (multi-threaded) • Source of many problems • Hard to predict and test behaviors -> leads to bugs • Very hardand unpleasant todebug • Breaks program analyses that rely on repeatability • Obstacle for adoption of parallel programming
Dealing with Non-Determinism • Eliminate it • Deterministic program execution enforced by runtime (e.g. constrained execution [ISCA’09]) • Deterministic Replay • Let it be butcapture and reproduce execution if needed • Every instruction gets same input as in original run • This paper: User-level Deterministic Replay • Implementation, challenges and usage examples
Requirements • No OS or hardware changes • No changes in user environment • Manageable log sizes for long runs • Reasonable run-time overhead • Multi-threaded and multi-processed applications • Integration with other existing analysis tools (e.g. Dynamic analyzers, debuggers, profilers) • No assumptions about synchronization APIs
Rest of the Talk • Motivation & Requirements • PinPlay Overview • Usage Examples • Results • Summary
PinPlay User-level deterministic replay and analysis Logs (pinballs) Binary + Input PinPlay Normal Program Output + capture OS (Linux® or Windows®) • Run in application’s native environment • Replays user code • OS independent: cross-OS replay! • Easily integrates w/ other tools and debuggers Analysis Tools Logs (pinballs) + PinPlay replay Debuggers OS (Linux® or Windows®)
Replay Models • Parallel-capture and parallel-replay T0 T2 T1 T0 T2 T1 T0 T2 T1 Logs (pinballs) PinPlay PinPlay • Parallel-capture and isolated-replay T0 PinPlay Logs (pinballs) Logs (pinballs) PinPlay PinPlay T1 Logs (pinballs) PinPlay T2
Information Captured For Replay All memory Values • Subset of Memory Values • Shadow-memory to capture first reads without prior writes and OS side-effects automatically [Sigmetrics’06] • Values changed by remote threads • Initial registers and OS register side-effects: • Signals/Exceptions/APCs/system calls • Code executed (user and libraries) • Position of code and stack • Output of some instructions (e.g. RDTSC) • Subset of shared-memory access interleaving (transitive opt. - FDR [ISCA’03]) Reads without prior writes OS side-effects used by app Values from remote threads All other values (not captured)
PinPlay Architecture User Land Application code and data Capable of logging, replaying and relogging execution (recapture from a replaying run) pinball Your Pin-based Tool PinPlay Lib Replayer Logger Instrumentation and analysis to capture logs Instrumentation and analysis to inject side-effects Intel’s Pin (JIT compiler and instrumentor) * OS (Linux® or Windows®) * http://www.pintool.org/
Cross-OS Replay and Challenges • Log on one OS and replay on another • System call translations • Most OS activity does not happen on replay (only side-effects restored) • Semantics is translated across OSes (e.g. create thread) • Memory mapping • Problem: address space different across OSes • Solution: use Pin’s Fetch API to redirect code and memory operand rewriting to redirect data Remap code code code address space on Windows® address space on Linux® Remap data data data
Usage Example: Program Analysis • Sampling and checkpointing for simulation • One run for profiling and finding representative regions, another for checkpointing • Requirement: both runs must be identical Logs (pinballs) PinPlay + Profiler Logs (pinballs) PinPlay Per-Process pinball Multi-process MPI program Per-Process pinball Checkpoints for simulation PinPlay + Checkpointer Representative Regions • Pinballs are used to share workloads for Pin-based analyses among architects
Usage Example: Replay for Debugging • Capture a buggy run and replay under debugger • Guaranteed to reproduce the bug and helps root causing • Works w/ off-the-shelf unmodified debuggers (e.g. GDB) • PinPlay based tool extends GDB commands w/ your own • Limitation: debugger can’t change control-flow • Used to debug various multi-threaded applications • Also using it for in-house debugging of concurrency issues with a major database vendor PinPlay Enabled Debugger Tool Logs (pinballs) GDB (unmodified) Binary remote protocol Intel’s Pin
Results Isolated replay
Sources of Slowdown • Instrumentation of every memory operation to identify system call side-effects and log data • Could be done by OS at the cost of OS modification or OS-specific analysis (doesn’t work on Windows®) • Locks for shadow-memory accesses • Could be eliminated by using a shadow-copy per thread at the cost of significant increase in log sizes • Other optimizations possible (please look at the paper)
Summary • User-level deterministic capture and replay • No OS changes, special hardware, or virtualization • Integrates w/ other Pin-tools for repeatable analysis and debugging • Replay occurs on any machine and works across OSes (Windows to Linux) • Pinballs are OS-independent and self-contained • Ideal for sharing workloads among researchers, for Pin-based analyses • We will release PinPlay libraries in future