860 likes | 982 Views
Concurrent Revisions A strong alternative to sequential consistency. Daan Leijen, Alexandro Baldassin , and Sebastian Burckhardt Microsoft Research (appearing at OOPSLA 2010 and ESOP 2011). Sequential consistency (SC). int x = 0 int y = 0. if (y==0) x++;. if (x==0) y++;. ??.
E N D
Concurrent RevisionsA strong alternative to sequential consistency Daan Leijen, AlexandroBaldassin, and Sebastian Burckhardt Microsoft Research (appearing at OOPSLA 2010 andESOP 2011)
Sequential consistency (SC) int x = 0 int y = 0 if (y==0) x++; if (x==0) y++; ?? assert( (x==0 && y==1) || (x==1 && y==0) || (x==1 && y==1)); What are the possible values for x and y?
Is SC a good memory model? • Hardware architects don’t like it • Costly to guarantee SC ordering for all memory accesses • Too low-level for most programmers • Direct reasoning about interleavings is difficult • How can we raise the abstraction level? • Programming-Model-Level, not Language-Level • What programming model?
Consider Common Application Scenario:Shared Data and Tasks Reader Mutator R R R R Reader Shared Data Reader Mutator Mutator • Office application • Shared Document • Editing, Spellcheck, Background Save, Paginate • Games • Shared Game Objects • Physics, Collision Detection, AI, Render, Network, … • How to parallelize tasks?(note: tasks do conflict)
Our solution: Concurrent Revisions Revision A Parallel Task that gets its own logical copy of the state Isolation Type A type which supports automatic copying / merging of versions on write-write conflict • Reminiscent of source control systems • Conceptually, each revision gets its own private copy of all the shared data • On join, all effects of the joined revision are applied to the joining revision
Concurrent revisions Versioned<int> x = 0 Versioned<int> y = 0 Isolation and x is always 0 Isolation y is always 0 if (y==0) x++; if (x==0) y++; Determinism only 1 possible result assert(x==1 && y==1); ??
Conflict resolution On a write-write conflict, the modification in the child revision wins. Versioned<int> x; x = 0 x = 0 x = 0 x = 1 x = 2 x = 1 x = 0 x = 1 assert(x==2) assert(x==0) assert(x==1)
Custom conflict resolution Cumulative<int, merge> x; int merge ( int current, int join, int original) { return current + (join – original); } x = 0 0 x += 1 x += 2 1 merge(1,2,0) → 3 2 assert(x==3)
Revisions are not Transactions • Determinism • Resolves conflicts deterministically • Never rolls back • No restrictions on revisions • can be long-running • can do I/O • Conflicts don’t kill parallel performance • Conflicts do not force serialization
Spacewars! About 15k lines of C# code, using DirectX. The original game is sequential
Sequential Game Loop How to parallelizegiven conflicts on object positions? Updates all positions Writes some positions Writes some positions Reads all positions Reads all positions
Revision Diagram for Parallelized Game Loop Coll. Det. 4 Coll. Det. 1 Coll. Det. 2 Coll. Det. 3 Render network Physics autosave (long running)
Example 1: read-write conflict. Coll. Det. 4 Coll. Det. 1 Coll. Det. 2 Coll. Det. 3 Render network Physics autosave (long running) • Render task reads position of all game objects • Physics task updates position of all game objects • No interference!
Example 2: write-write conflict. Coll. Det. 4 Coll. Det. 1 Coll. Det. 2 Coll. Det. 3 Render network Physics autosave (long running) • Physics taskupdates position of all game objects • Network task updates position of some objects • Network updates have priority over physics updates • Order of joins establishes precedence!
Results Physics task Render Collision detection • Autosave now perfectly unnoticeable in background • Overall Speed-Up: 3.03x on four-core (almost completely limited by graphics card)
Implementation: C# library • For each versioned object, maintain multiple copies • Map revision ids to versions • synchronization free access to local copy (on x86) • New copies are allocated lazily • Don’t copy on fork… copy on first write after fork • Old copies are released on join • No space leak
Example API for C# library w/ rewriter classSample { [Versioned] intx = 0; publicvoid Run() { varr = CurrentRevision.Fork(() => { x = 2; }); x = 1; CurrentRevision.Join(r); Assert(x == 2); } }
Semantics of Concurrent Revisions [ESOP’11] • Calculus (similar to AME-calculus by Abadi et al.) • Untyped lambda-calculus with mutable references and fork/join primitives • Determinacy Proof • Merge functions • Discussion of ADTs and merge functions • Relate to snapshot isolation (- nesting, - resolution) • Revision Diagrams • Prove: semilattice structure
By construction, there is no ‘global’ state: just local state for each revision State is simply a (partial) function from a location to a value
Operational Semantics s →r s’means: revision r takes a step, transforming s to s’
Interested? http://research.microsoft.com/revisions Videos • Channel 9 interview w/ Erik Meijer Papers • OOPSLA ‘10 on implementation and game • ESOP ‘11 on semantics The implementation • Try it online at www.rise4fun.com • Download binary release by end of January
Demo classSample { [Versioned] inti = 0; publicvoid Run() { varr = CurrentRevision.Fork(() => { i += 1; }); i += 2; CurrentRevision.Join(r); Console.WriteLine("i = " + i); } }
Results Physics task Render Collision detection • Autosave now perfectly unnoticeable in background • Overall Speed-Up: 3.03x on four-core (almost completely limited by graphics card)
Overhead:How much does all the copying and the indirection cost? Only a 5% slowdown in the sequential case Some individual tasks slow down much more (i.e. physics simulation)
A Software engineering perspective • Transactional memory: • Code centric: put “atomic” in the code • Granularity: • too broad: too many conflicts and no parallel speedup • too small: potential races and incorrect code • Concurrent revisions: • Data centric: put annotations on the data • Granularity: group data that have mutual constraints together, i.e. if (x + y > 0) should hold, then x and y should be versioned together.
Current Implementation: C# library • For each versioned object, maintain multiple copies • Map revision ids to versions • `mostly’ lock-free array • New copies are allocated lazily • Don’t copy on fork… copy on first write after fork • Old copies are released on join • No space leak
Examples from SpaceWars Game Example 1:read-write conflict • Render taskreads position of all game objects • Physics task updates position of all game objects => Render task needs to see consistent snapshot Example 2: write-write conflict • Physics task updates position of all game objects • Network task updates position of some objects => Network has priority over physics updates
Conventional Concurrency Control Conflicting tasks cannot efficiently execute in parallel. • pessimistic concurrency control (i.e. locks) • use locks to avoid parallelism where there are (real or potential) conflicts • optimistic concurrency control (i.e. TM) • speculate on absence of conflictsrollback if there are real conflicts either way: true conflicts kill parallelism.
What’s new No isolation: We see either 0 or 1 depending on the schedule fork revision: forks off a private copy of the shared state Isolation types: declares shared data • Isolation: side effects are only visible when the revision is joined. • Deterministic execution! Concurrent Revisions Traditional Task int x = 0; Task t = fork { x = 1; } assert(x==0 || x==1); join t; assert(x==1); Versioned<int> x = 0; Revision r = rfork { x = 1; } assert(x==0); join r; assert(x==1); isolation: Concurrent modifications are not seen by others join revision: waits for the revision to terminate and writes back changes into the main revision
Puzzle time… int x = 0; int y = 0; Task t = fork { if (x==0) y++; } if (y==0) x++; join t; Hard to read: let’s use a diagram instead…
Demo: Sandbox Fork a revision without forking an associated task/thread classSandbox { [Versioned] inti = 0; public void Run() { var r = CurrentRevision.Branch("FlakyCode"); try { r.Run(() => { i = 1; throw newException("Oops"); }); CurrentRevision.Merge(r); } catch { CurrentRevision.Abandon(r); } Console.WriteLine("\n i = "+ i); } } Run code in a certain revision Merge changes in a revision into the main one Abandon a revision and don’t merge its changes.
SpaceWars Game Sequential Game Loop: Parallel Collision Detection Render Screen Parallel Collision Detection Parallel Collision Detection Graphics Card Simulate Physics Shared State Play Sounds Send Process Inputs Receive Autosave Key- board Network Connection Disk
Sequential Consistency int x = 0; int y = 0; task t = fork { if (x==0) y++; } if (y==0) x++; join t; assert( (x==0 && y==1) || (x==1 && y==0) || (x==1 && y==1)); Concurrent Revisions Transactional Memory int x = 0; int y = 0; task t = fork { atomic { if (x==0) y++; } } atomic { if (y==0) x++; } join t; versioned<int> x = 0; versioned<int> y = 0; revision r = rfork { if (x==0) y++; } if (y==0) x++; join r; assert( (x==0 && y==1) || (x==1 && y==0)); assert(x==1 && y==1);
x = 0 0 x += 1 x += 2 2 x += 3 merge(1,2,0) 3 1 2 3 5 merge(3,5,2) 6 assert( x==6 )
By construction, there is no ‘global’ state: just local state for each revision State is simply a (partial) function from a location to a value
Operational Semantics the state is a composition of the root snapshot and local modifications For some revision r, with snapshot and local modifications and an expression context with hole(x.e) v On a join, the writes of the joineer’ take priority over the writes of the current revision: ::’ On a fork, the snapshot of the new revision r’ is the current state: ::
Custom merge: per location (type) No conflict if a location was not written in the joinee On a join, using a merge function. No conflict if a location was unmodified in the current revision, use the value of the joinee Conflict otherwise, use a location/type specific merge function Standard merges:
What is a conflict? Cumulative<int> x = 0 • Merge is only called if: (1) write in child, and (2) modification in main revision: 0 x += 2 2 No conflict (merge function is not called) x += 3 0 2 2 5 No conflict (merge function is not called) assert( x = 5 )
Merging with failure On fail, we just ignore any writes in the joinee
Snapshot isolation • Widely used in databases, for example Oracle and Microsoft SQL • In essence, in snapshot isolation a concurrent transaction can only complete in the absence of write-write conflicts. • Our calculus generalizes snapshot isolation: • We support arbitrary nesting • We allow custom merge functions to resolve write-write conflicts deterministically
Snapshot isolation We can succinctly model snapshot isolation as: • Disallow nesting • Use the default merge: Some versions of snapshot isolation do not treat silent writes in a transaction as a conflict:
Sequential merges • We can view each location as an abstract data types (i.e. object) with certain operations (i.e. methods). • If a merge function always behaves as if concurrent operations for those objects are sequential, we call it a sequential merge. • Such objects always behave as if the operations in the joinee are all done sequentially at the join point.
Sequential merges x = o • A merge is sequential if: merge(uw1(o), uw2(o), u(o))= uw1w2(o) • Anduw1w2(o) u w1 w2 merge(uw1(o), uw2(o), u(o))
Abelian merges • For any abstract data type that forms an abelian group (associative, commutative, with inverses) with neutral element 0 and an operation , the following merge is sequential: merge(v,v’,v0) = v v’ v0 • This holds for example for additive integers and additive sets.