1 / 29

Deterministic Execution of Nondeterministic Shared-Memory Programs

Deterministic Execution of Nondeterministic Shared-Memory Programs. Dan Grossman University of Washington Dagstuhl Seminar on Design and Validation of Concurrent Systems August 2009. What if….

manasa
Download Presentation

Deterministic Execution of Nondeterministic Shared-Memory Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deterministic Execution of Nondeterministic Shared-Memory Programs Dan Grossman University of Washington Dagstuhl Seminar on Design and Validation of Concurrent Systems August 2009

  2. What if… What if you could run the same multithreaded program on the same inputs twice and know you would get the same results? • What exactly does that mean? • Why might you want that? • How can we do that (semi-efficiently)? But first: • Some background on me and “the talks I’m not giving” • Key terminology and perspectives • More important than technical details at this event Dan Grossman: Determinism

  3. Biography / group names Me: • “Programming-languages person” • Type systems, compilers for memory-safe C dialect 200-2004 • 30%  80% focus on multithreading, 2005- • Co-advising 3-4 students with computer architect Luis Ceze, 2007- Two groups for “marketing purposes” • WASP, wasp.cs.washington.edu • SAMPA, sampa.cs.washington.edu Dan Grossman: Determinism

  4. The talk you won’t see void transferFrom(int amt, Acct other){ atomic{ other.withdraw(amt); this.deposit(amt); } } “Transactions are to shared-memory concurrency as garbage collection is to memory management” [OOPSLA 07] Semantic problems with nontransactional accesses: worse than locks! • Fix with stronger guarantees and compiler opts [PLDI07] • Or static type system, formal semantics, and proof [POPL08] • Or more dynamic approach adapting to Haskell [submitted] • … Prototypes for OCaml, Java, Scheme, and Haskell Dan Grossman: Determinism

  5. This talk… Take an arbitrary C/C++ program with POSIX threads • Locks, barriers, condition variables, data races, whatever Compile it funny Link it against a funny run-time system Get deterministic behavior • Well, as deterministic as a sequential C program Joint work: Luis Ceze, Tom Bergan, Joe Devietti, Owen Anderson Dan Grossman: Determinism

  6. Terminology Essential perspectives, not just definitions • Parallelism vs. concurrency • Or different terms if you prefer • Sequential semantics vs. determinism vs. nondeterminism • What is an input? • Level of abstraction • Which one do you care about? Dan Grossman: Determinism

  7. Concurrency Working “definition”: Software is concurrent if a primary intellectual challenge is responding to external events from multiple sources in a timely manner. Examples: operating system, shared hashtable, version control Key challenge is responsiveness • often leads to threads or asynchrony Correctness usually requires synchronization (e.g., locks) Dan Grossman: Determinism

  8. Parallelism Working “definition”: Software is parallel if a primary intellectual challenge is using extra computational resources to do more useful work per unit time. Examples: scientific computing, most graphics, a lot of servers Key challenge is Amdahl’s Law • No sequential bottlenecks, no imbalanced load When pure fork-join isn’t correct, need synchronization Dan Grossman: Determinism

  9. The confusion • First, this use of terms isn’t standard • Many systems are both • And it’s really a matter of degree • Similar lower-level mechanisms, such as threads and locks • And similar errors (race conditions, deadlocks, etc.) • Our work determinizes these lower-level mechanisms, so we determinize concurrent and parallel applications • But purely parallel ones probably benefit less Dan Grossman: Determinism

  10. Terminology Essential perspectives, not just definitions • Parallelism vs. concurrency • Or different terms if you prefer • Sequential semantics vs. determinism vs. nondeterminism • What is an input? • Level of abstraction • Which one do you care about? Dan Grossman: Determinism

  11. Sequential semantics • Some languages can have results defined purely sequentially, but are designed to have better parallel-performance guarantees (thanks to a cost model) • Examples: DPJ, Cilk, NESL, … • For correctness, reason sequentially • For performance, reason in parallel • Really designed for parallelism, not concurrency • Not our work Dan Grossman: Determinism

  12. Sequential isn’t always deterministic [Surprisingly easy to forget this] int f1(){ print(“A”); print(“B”); return 0; } int f2(){ print(“C”); print(“D”); return 0; } int g() { return f1() + f2(); } Must g() print ABCD? • Java: yes • C/C++: no, CDAB allowed, but not ACBD, ACDB, etc. Dan Grossman: Determinism

  13. Another example Dijkstra’s guarded-command conditionals if x % 2 == 1 -> y := x - 1 [] x < 10 -> y := 7 [] x >= 10 -> y := 0 fi We might still expect a particular language implementation (compiler) to be deterministic • May choose any deterministic result consistent with the nondeterministic semantics • Presumably doesn’t change choice across executions, but may across compiles (including “butterfly effects”) • Our work does this Dan Grossman: Determinism

  14. Why helpful? So programmer gets a deterministic executable, but doesn’t know which one • Key degree of freedom for automated performance Still helpful for: • Whole-program testing and debugging • Automated replicas • In general, repeatability and reducing possible executions Dan Grossman: Determinism

  15. Define deterministic, part 1 Deterministic: “outputs depend only on inputs” • That’s right, but means must clearly specify what is an input (and an output) • Can define away anything you want • Example: All syscall results are inputs, so seeding the pseudorandom number generator with time-of-day is “deterministic” • We mean what you think we mean • Inputs: command-line, I/O, syscalls • Not inputs: cache state, hardware timing, thread scheduler Dan Grossman: Determinism

  16. Terminology Essential perspectives, not just definitions • Parallelism vs. concurrency • Or different terms if you prefer • Sequential semantics vs. determinism vs. nondeterminism • What is an input? • Level of abstraction • Which one do you care about? Dan Grossman: Determinism

  17. Define deterministic, part 2 “Is it deterministic?” depends crucially on your abstraction level • Another obvious easy-to-forget thing Examples: • File systems • Memory-allocation (Java vs. C) • Set implemented as a list • Quantum mechanics Our work: • The “language level”: state of logical memory, program output • Application may care only about a higher level (future work) Dan Grossman: Determinism

  18. Okay… how? Trade-off between complexity and performance: PERFORMANCE COMPLEXITY Performance: • Overhead (single-thread slowdown) • Scalability (minimize extra synchronization, waiting) Dan Grossman: Determinism

  19. load A store C load A … … … store B load B store C Starting serial Determinization is easy! • Run one thread at a time in round-robin order • Context-switch after N basic blocks for deterministic N • Cannot use a timer; use compiler and run-time • Races in source program are irrelevant; locks still respected Example with 3 threads running (time moves with arrows) T1 T2 T3 1 quantum 1 round Dan Grossman: Determinism

  20. store C … load B Parallel quanta • The quanta in a round can start to run in parallel provided they stop before any communication occurs (see how next) • So each round has two stages, parallel then serial T1 T2 T3 Parallel stage ends with global barrier load A load A Serial stage ends; next round starts store B store C … … Dan Grossman: Determinism

  21. store C load B Is that legal? T1 T2 T3 • Can produce different result than serial execution • In fact, execution not necessarily equivalent with any serialization of quanta But it doesn’t matter as long as we are deterministic! Just need: • Parallel stages do no communication • Parallel stages end at deterministic points load A load A store B store C Dan Grossman: Determinism

  22. store C load B Performance T1 T2 T3 Keys to scalability: • Run almost everything in the parallel stage • Keep quanta balanced • Assume (1), use rough instruction costs load A load A store B store C Dan Grossman: Determinism

  23. store C load B Memory ownership To avoid communication during parallel stage: • Every memory location is “shared” or “owned by 1 thread T” • Dynamic table checked and updated during execution • Can read only memory that is shared or owned-by-you • Can write only memory owned-by-you • Locks: just like memory locations + blocking ends quantum In our example, perhaps A is shared, B and C are owned by T2 T1 T2 T3 load A load A store B store C Dan Grossman: Determinism

  24. Changing ownership Policy: For each location (any deterministic granularity is correct), • First owner is first thread to allocate in the location • On read in serial stage, if owned-by-other set to shared • One write in serial stage, set to owned-by-self Correctness: • Ownership immutable in parallel stages (so no communication) • Serial-stage changes are deterministic So many, many polices are correct • Chose the obvious one for temporal locality + read-sharing • Must have good locality for scalability! Dan Grossman: Determinism

  25. Overhead Significant overhead: • All reads/writes consult ownership information • All basic blocks subtract from a thread-local quantum counter Reduce via: • Lots of run-time engineering and data structures (not too much magic, but most important) • Obvious compiler optimizations like escape analysis and hoisting counter-subtractions • Specialized compiler optimizations like Subsequent Access Optimization: Don’t recheck same ownership unless a quantum boundary might intervene. • Correctness of this is a subtle argument and slightly affects the ownership-change policy (deterministically!) Dan Grossman: Determinism

  26. Brittle Change any line of code, command-line argument, environment variable, etc. and you can get a different deterministic program  We are mostly robust to memory-safety errors , except  • Bounds errors that corrupt ownership information • Bounds errors that write to another thread’s allegedly-thread-local data Dan Grossman: Determinism

  27. Results Overhead: Varies a lot, but about 3x at 8 threads Scalability: Varies a lot, but on average with parsec suite (*) nondet 8 threads vs. nondet 2 threads = 2.4 (linear = 4) det 8 threads vs. det 2 threads = 2.0 det 8 threads vs. nondet 2 threads = 0.91 (range 0.41 - 2.75) “How do you want to spend Moore’s Dividend?” * subset runnable: no mpi, no C++ exceptions, no 32-bit assumptions Dan Grossman: Determinism

  28. Buffering Actually, ownership is only one approach Second approach relies on buffering and a commit stage • Even higher overhead (to consult buffers) • Even better scalability (block only for synchronization & commits) And a third hybrid approach Hopefully more details soon Dan Grossman: Determinism

  29. Conclusion The fundamental assumption that nondeterministic shared-memory programs must be run nondeterministically is false A fun problem to throw principled compiler and run-time optimizations at. Could dramatically change how we test and debug parallel and concurrent programs Most-related work: • Kendo from MIT: done concurrently (in parallel? ), requires knowing about data races statically, different approach • Colleagues in ASPLOS09: hardware support for ownership • Record & replay systems:we can replay without the record Dan Grossman: Determinism

More Related