Programming-Language Approaches to Improving Shared-Memory Multithreading: Work-In-Progress

Programming-Language Approaches to Improving Shared-Memory Multithreading: Work-In-Progress Dan Grossman University of Washington Microsoft Research, RiSE July 28, 2009

Today • A little history / organization: how I got here • Informal, broad-not-deep overview of 4 ongoing projects • Better semantics / languages for transactional memory x 2 • Dynamic separation for Haskell • Semantics / abstraction for “escape actions” • Deterministic Multiprocessing • Code-centric communication graphs • Hopefully time for discussion Dan Grossman: Multithreading Work-In-Progress

Biography / group names Me: • PLDI, ICFP, POPL “feel like home”, 1998- • PhD for Cyclone  UW faculty, 2003- • Type system, compiler for memory-safe C dialect • 30%  80% focus on multithreading, 2005- • Co-advising 3-4 students with computer architect Luis Ceze, 2007- Two groups for “marketing purposes” • WASP, wasp.cs.washington.edu • SAMPA, sampa.cs.washington.edu Dan Grossman: Multithreading Work-In-Progress

People / other projects Ask me later about: • Progress estimation for PigLatin Hadoop queries [Kristi] • Composable browser extensions [Ben L.] Dan Grossman: Multithreading Work-In-Progress

Atomic blocks An easier-to-use and harder-to-implement synchronization primitive void transferFrom(int amt, Acct other){ atomic{ other.withdraw(amt); this.deposit(amt); } } “Transactions are to shared-memory concurrency as garbage collection is to memory management” [OOPSLA 07] GC also has key semantic questions most programmers can ignore • Resurrection, serialization, dead assignments, etc. Dan Grossman: Multithreading Work-In-Progress

“Weak” isolation initially y==0 Widespread misconception: “Weak” isolation violates the “all-at-once” property only if corresponding lock code has a race (May still be a bad thing, but smart people disagree.) atomic { y = 1; x = 3; y = x; } x = 2; print(y); //1? 2? 666? Dan Grossman: Multithreading Work-In-Progress

It’s worse Privatization: One of several examples where lock code works and weak-isolation transactions do not ptr initially ptr.f == ptr.g atomic { r = ptr; ptr = new C(); } assert(r.f==r.g); atomic { ++ptr.f; ++ptr.g; } f g (Example adapted from [Rajwar/Larus] and [Hudson et al]) Dan Grossman: Multithreading Work-In-Progress

It’s worse Most weak-isolation systems let the assertion fail! • Eager-update or lazy-update ptr initially ptr.f == ptr.g f g atomic { r = ptr; ptr = new C(); } assert(r.f==r.g); atomic { ++ptr.f; ++ptr.g; } Dan Grossman: Multithreading Work-In-Progress

The need for semantics • Which is wrong: the privatization code or the language implementation? • What other “gotchas” exist? • Can programmers correctly use transactions without understanding their implementation? Only rigorous programming-language semantics can answer Dan Grossman: Multithreading Work-In-Progress

Separation Static separation: Each thread-shared, mutable object is accessed-inside-transactions xor accessed-outside-transactions throughout its lifetime • Natural in STM Haskell (but not other settings) • Proved sound for eager update [POPL 08 x 2] Dynamic separation: Each thread-shared, mutable object has dynamic metastate explicitly set by programmers to determine “side of the partition” • Designed, proven, implemented by Abadi et al for Bartok Dan Grossman: Multithreading Work-In-Progress

Example redux ptr f g initially ptr.f == ptr.g atomic { r = ptr; ptr = new C(); } unprotect(r); assert(r.f==r.g); atomic { ++ptr.f; ++ptr.g; } Dan Grossman: Multithreading Work-In-Progress

Laura’s work Design, semantics, implementation, and benchmarks for dynamic separation in Haskell Primary contributions: • Regions: Change protection state of entire data structures in O(1) time • Cool idioms/benchmarks where this gives 2-6x speedup • Lazy-update implementation • Allows protection-state changes from within transactions • Interface allowing composable libraries that can be used inside or outside transactions, without breaking Haskell’s types • Formal semantics in the style of STM Haskell Dan Grossman: Multithreading Work-In-Progress

Escape actions atomic { s1; escape { s2; } // perhaps in a callee s3; } Escape actions: • Do not count for memory conflicts • Are not undone if transaction aborts • Possible “strange results” if race with transactional accesses Essentially an unchecked back-door Note: Open nesting is just escape { atomic { s } } • So escaping is the essential primitive Dan Grossman: Multithreading Work-In-Progress

Canonical example If escape actions are hidden behind strong abstractions, we can improve parallelism without affecting program behavior • Clients cannot observe the escaping Unique-id generation type id; id new_id(); bool compare_ids(id, id); Transactions generating ids need not conflict with each other If transaction aborts, no need to undo the id-generation Dan Grossman: Multithreading Work-In-Progress

Matt’s work • Formal semantics for escape actions • Use to prove the unique-id example is correct • Two implementations, one using escape • Show no client affected by choice of implementation • Fundamentally similar to proving ADTs actually work • Gotcha: The theorem is false if a client abuses other escape actions to “leak ids”: • Discovered by attempting the proof! atomic { id x = new_id(); if(compare_ids(x,glbl)) … escape {glbl=x;} … } Dan Grossman: Multithreading Work-In-Progress

Deterministic C Take arbitrary C + POSIX Threads and make behavior dependent only on inputs (not nondeterministic scheduling) • Helps testing, debugging, reproducibility, replication It’s easy! • Run one thread at a time with deterministic context-switch • Example: run for N instructions or until blocking It’s hard! • Need to recover scalability with reasonable overhead • Amdahl’s Law is one tough cookie! Dan Grossman: Multithreading Work-In-Progress

How to do it It’s a long and interesting compiler, run-time, and correctness story • Invite Luis over for an hour Key techniques: • Dynamic ownership of memory (run in parallel while threads access what they own) • Buffering (publish buffers deterministically while not violating language’s memory-consistency model) • No promise which deterministic execution programmer will get (tiny change to source code can affect behavior) Performance: • Depends on application • Buffering has better scalability but worse per-thread overhead, so hybrid approaches are sometimes needed Dan Grossman: Multithreading Work-In-Progress

Code-centric In a shared-memory C/C#/Java program, any heap access might be inter-thread communication • But very few actually are Most prior work to detect/exploit this sparseness is data-centric • What objects are thread-local? • What locks protect what memory? Answers can find bugs, optimize programs, define code metrics, etc. We provide a complementary code-centric view… Dan Grossman: Multithreading Work-In-Progress

Graph Nodes: Code units (e.g., functions) Directed edges: • Source did a write in thread T1 • Target read that write in thread T2 • T1 != T2 Current tool: • Automatically build graph of a (slower) dynamic execution • Manual easy clean-up by programmer • Rely heavily on state-of-the-art dynamic instrumentation (PIN) and graph visualization (Prefuse) Dan Grossman: Multithreading Work-In-Progress

A toy example queue q; // global, mutable void enqueue(T* obj) { … } T* dequeue() { … } void consumer(){ … T t = dequeue(); … } void producer(){ … T* t = …; t->f=…; enqueue(t) … } Program: multiple threads call producer and consumer enqueue dequeue producer consumer Tool supports “conceptual inlining” to allow multiple abstraction levels Dan Grossman: Multithreading Work-In-Progress

Not just for toys • Small and large applications • Example: MySQL (940KLOC); graph clean-up by one grad student in < 1 day without prior source-code knowledge • I truly believe: • Great “first day of internship” tool • interactive graph essential and not our contribution • Useful way to measure multithreaded behavior • Example: Graphs are very sparse thankfully MySQL: >11,000 functions, 423 nodes, 802 edges • Example: Graph diff across runs with same input measures the nondeterminism of the program But this is hard-to-evaluate tool work – your thoughts? • Future work: Specification of graphs checked during execution Dan Grossman: Multithreading Work-In-Progress

Summary • Better semantics / languages for transactional memory x 2 • Dynamic separation for Haskell • Semantics / abstraction for “escape actions” • Deterministic Multiprocessing • Code-centric communication graphs Very little published yet, but all Real Soon Now Microsoft has been essential • Transactions (Harris, Abadi, Peyton Jones, many more) • Funding (Scalable Multicore RFP, New Faculty Fellows) Hopefully opportunities to collaborate • Particularly on the (unproven) SE applications of this work Dan Grossman: Multithreading Work-In-Progress

Programming-Language Approaches to Improving Shared-Memory Multithreading: Work-In-Progress