PinPlay : A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs

PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira, Mack Stallcup, Gregory Lueck, James Cownie Intel Corporation CGO 2010, Toronto, Canada

Non-Determinism • Program execution is not repeatable across runs • Interactions with environment (single-threaded) • Shared-memory interleaving (multi-threaded) • Source of many problems • Hard to predict and test behaviors -> leads to bugs • Very hardand unpleasant todebug • Breaks program analyses that rely on repeatability • Obstacle for adoption of parallel programming

Dealing with Non-Determinism • Eliminate it • Deterministic program execution enforced by runtime (e.g. constrained execution [ISCA’09]) • Deterministic Replay • Let it be butcapture and reproduce execution if needed • Every instruction gets same input as in original run • This paper: User-level Deterministic Replay • Implementation, challenges and usage examples

Requirements • No OS or hardware changes • No changes in user environment • Manageable log sizes for long runs • Reasonable run-time overhead • Multi-threaded and multi-processed applications • Integration with other existing analysis tools (e.g. Dynamic analyzers, debuggers, profilers) • No assumptions about synchronization APIs

Rest of the Talk • Motivation & Requirements • PinPlay Overview • Usage Examples • Results • Summary

PinPlay User-level deterministic replay and analysis Logs (pinballs) Binary + Input PinPlay Normal Program Output + capture OS (Linux® or Windows®) • Run in application’s native environment • Replays user code • OS independent: cross-OS replay! • Easily integrates w/ other tools and debuggers Analysis Tools Logs (pinballs) + PinPlay replay Debuggers OS (Linux® or Windows®)

Replay Models • Parallel-capture and parallel-replay T0 T2 T1 T0 T2 T1 T0 T2 T1 Logs (pinballs) PinPlay PinPlay • Parallel-capture and isolated-replay T0 PinPlay Logs (pinballs) Logs (pinballs) PinPlay PinPlay T1 Logs (pinballs) PinPlay T2

Information Captured For Replay All memory Values • Subset of Memory Values • Shadow-memory to capture first reads without prior writes and OS side-effects automatically [Sigmetrics’06] • Values changed by remote threads • Initial registers and OS register side-effects: • Signals/Exceptions/APCs/system calls • Code executed (user and libraries) • Position of code and stack • Output of some instructions (e.g. RDTSC) • Subset of shared-memory access interleaving (transitive opt. - FDR [ISCA’03]) Reads without prior writes OS side-effects used by app Values from remote threads All other values (not captured)

PinPlay Architecture User Land Application code and data Capable of logging, replaying and relogging execution (recapture from a replaying run) pinball Your Pin-based Tool PinPlay Lib Replayer Logger Instrumentation and analysis to capture logs Instrumentation and analysis to inject side-effects Intel’s Pin (JIT compiler and instrumentor) * OS (Linux® or Windows®) * http://www.pintool.org/

Cross-OS Replay and Challenges • Log on one OS and replay on another • System call translations • Most OS activity does not happen on replay (only side-effects restored) • Semantics is translated across OSes (e.g. create thread) • Memory mapping • Problem: address space different across OSes • Solution: use Pin’s Fetch API to redirect code and memory operand rewriting to redirect data Remap code code code address space on Windows® address space on Linux® Remap data data data

Usage Example: Program Analysis • Sampling and checkpointing for simulation • One run for profiling and finding representative regions, another for checkpointing • Requirement: both runs must be identical Logs (pinballs) PinPlay + Profiler Logs (pinballs) PinPlay Per-Process pinball Multi-process MPI program Per-Process pinball Checkpoints for simulation PinPlay + Checkpointer Representative Regions • Pinballs are used to share workloads for Pin-based analyses among architects

Usage Example: Replay for Debugging • Capture a buggy run and replay under debugger • Guaranteed to reproduce the bug and helps root causing • Works w/ off-the-shelf unmodified debuggers (e.g. GDB) • PinPlay based tool extends GDB commands w/ your own • Limitation: debugger can’t change control-flow • Used to debug various multi-threaded applications • Also using it for in-house debugging of concurrency issues with a major database vendor PinPlay Enabled Debugger Tool Logs (pinballs) GDB (unmodified) Binary remote protocol Intel’s Pin

Results Isolated replay

Sources of Slowdown • Instrumentation of every memory operation to identify system call side-effects and log data • Could be done by OS at the cost of OS modification or OS-specific analysis (doesn’t work on Windows®) • Locks for shadow-memory accesses • Could be eliminated by using a shadow-copy per thread at the cost of significant increase in log sizes • Other optimizations possible (please look at the paper)

Summary • User-level deterministic capture and replay • No OS changes, special hardware, or virtualization • Integrates w/ other Pin-tools for repeatable analysis and debugging • Replay occurs on any machine and works across OSes (Windows to Linux) • Pinballs are OS-independent and self-contained • Ideal for sharing workloads among researchers, for Pin-based analyses • We will release PinPlay libraries in future

Q&A

PinPlay : A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs

PinPlay : A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs

Presentation Transcript

Analytical Modeling of Parallel Systems

Language-Based Replay via Data Flow Cut

Capture and Replay

A Framework for Asynchronous Parallel Machine Learning

A Type and Effect System for Deterministic Parallel Java

Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores

CSE332: Data Abstractions Lecture 19: Analysis of Fork-Join Parallel Programs

A Framework For Automated Parallel Random Unit Testing Of Sequential Programs

SSS Framework

Base Case and Sensitivity Analysis – “Waterfall and Tornadoes”

GraphLab A New Parallel Framework for Machine Learning

Chapter 8 Objectives

Analytical Modeling of Parallel Systems

Parallel ICA Algorithm and Modeling

A “Flight Data Recorder” for Enabling Full-system Multiprocessor Deterministic Replay

Parallel Programs

Parallel Processing (CS 676) Overview

Introduction to Probabilistic Analysis

Source Level Debugging of Parallel Programs

ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay