360 likes | 498 Views
Concurrent Revisions: A deterministic concurrency model. Daan Leijen & Sebastian Burckhardt Microsoft Research (invited talk at FOOL 2010). Algorithmic/ Scientific. Special-Purpose. Interactive/ Reactive. Operating System. Data Mining Biology Chemistry Physics. Desktop Applications.
E N D
Concurrent Revisions:A deterministic concurrency model. Daan Leijen & Sebastian Burckhardt Microsoft Research (invited talk at FOOL 2010)
Algorithmic/ Scientific Special-Purpose Interactive/ Reactive Operating System Data Mining Biology Chemistry Physics Desktop Applications Database Games Multimedia Signal Processing Parallel Programming TPL, Fortran, MPI, Cilk, StreamIt, X10, Cuda, ... Concurrent Programming Threads & Locks, Futures, Promises, Transactions, ... ? Our Focus.
Application = Shared Data and Tasks Reader Reader Mutator R R R R Shared Data Mutator Mutator Reader Mutator Example: Office application • Save the document • React to keyboard input by the user • Perform a spellcheck in the background • Exchange updates with collaborating remote users
Examples from SpaceWars Game Example 1:read-write conflict • Render taskreads position of all game objects • Physics task updates position of all game objects => Render task needs to see consistent snapshot Example 2: write-write conflict • Physics task updates position of all game objects • Network task updates position of some objects => Network has priority over physics updates
Conventional Concurrency Control Conflicting tasks can not efficiently execute in parallel. • pessimistic concurrency control • use locks to avoid parallelism where there are (real or potential) conflicts • optimistic concurrency control • speculate on absence of conflictsrollback if there are real conflicts either way: true conflicts kill parallel performance.
Our Proposed Programming Model: Revisions and Isolation Types Revision A logical unit of work that is forked and joined Isolation Type A type which implements automatic copying/merging of versions on write-write conflict • Deterministic Conflict Resolution, never roll-back • Full concurrent reading and writing of shared data • No restrictions on tasks (can be long-running, do I/O) • Clean semantics • Fast and space-efficient runtime implementation
What’s new Isolation types: declares shared data fork revision: forks off a private copy of the shared state • Isolation: side effects are only visible when the revision is joined. • Deterministic execution! Traditional Task Concurrent Revisions int x = 0; task t = fork { x = 1; } assert(x==0 || x==1); join t; assert(x==1); versioned<int> x = 0; revision r = rfork { x = 1; } assert(x==0); join r; assert(x==1); isolation: Concurrent modifications are not seen by others join revision: waits for the revision to terminate and writes back changes into the main revision
Sequential Consistency int x = 0; int y = 0; task t = fork { if (x==0) y++; } if (y==0) x++; join t; assert( (x==0 && y==1) || (x==1 && y==0) || (x==1 && y==1)); Concurrent Revisions Transactional Memory int x = 0; int y = 0; task t = fork { atomic { if (x==0) y++; } } atomic { if (y==0) x++; } join t; versioned<int> x = 0; versioned<int> y = 0; revision r = rfork { if (x==0) y++; } if (y==0) x++; join r; assert( (x==0 && y==1) || (x==1 && y==0)); assert(x==1 && y==1);
Conflict resolution Versioned<int> x; By default, on a write-write conflict (only), the modification in the child revision wins. x = 0 x = 0 x = 0 x = 1 x = 2 x = 1 x = 0 x = 1 assert( x==2 ) assert( x==0 ) assert( x==1 )
Custom conflict resolution Cumulative<int,(main,join,orig).main + join – orig> x; x = 0 0 x += 1 x += 2 merge(1,2,0) 3 1 2 assert(x==3)
x = 0 0 x += 1 x += 2 2 x += 3 merge(1,2,0) 3 1 2 3 5 merge(3,5,2) 6 assert( x==6 )
A Software engineering perspective • Transactional memory: • Code centric: put “atomic” in the code • Granularity: • too broad: too many conflicts and no parallel speedup • too small: potential races and incorrect code • Concurrent revisions: • Data centric: put annotations on the data • Granularity: group data that have mutual constraints together, i.e. if (x + y > 0) should hold, then x and y should be versioned together.
Current Implementation: C# library • For each versioned object, maintain multiple copies • Map revision ids to versions • `mostly’ lock-free array • New copies are allocated lazily • Don’t copy on fork… copy on first write after fork • Old copies are released on join • No space leak
Demo classSample { [Versioned] inti = 0; publicvoid Run() { vart1 = CurrentRevision.Fork(() => { i += 1; }); vart2 = CurrentRevision.Fork(() => { i += 2; }); i+=3; CurrentRevision.Join(t1); CurrentRevision.Join(t2); Console.WriteLine("i = " + i); } }
Demo: Sandbox Fork a revision without forking an associated task/thread classSandbox { [Versioned] inti = 0; public void Run() { var r = CurrentRevision.Branch("FlakyCode"); try { r.Run(() => { i = 1; throw newException("Oops"); }); CurrentRevision.Merge(r); } catch { CurrentRevision.Abandon(r); } Console.WriteLine("\n i = "+ i); } } Run code in a certain revision Merge changes in a revision into the main one Abandon a revision and don’t merge its changes.
By construction, there is no ‘global’ state: just local state for each revision State is simply a (partial) function from a location to a value
Operational Semantics the state is a composition of the root snapshot and local modifications For some revision r, with snapshot and local modifications and an expression context with hole(x.e) v On a join, the writes of the joineer’ take priority over the writes of the current revision: ::’ On a fork, the snapshot of the new revision r’ is the current state: ::
Custom merge: per location (type) No conflict if a location was not written in the joinee On a join, using a merge function. No conflict if a location was unmodified in the current revision, use the value of the joinee Conflict otherwise, use a location/type specific merge function Standard merges:
What is a conflict? Cumulative<int> x = 0 • Merge is only called if: (1) write in child, and (2) modification in main revision: 0 x += 2 2 No conflict (merge function is not called) x += 3 0 2 2 5 No conflict (merge function is not called) assert( x = 5 )
Merging with failure On fail, we just ignore any writes in the joinee
Snapshot isolation • Widely used in databases, for example Oracle and Microsoft SQL • In essence, in snapshot isolation a concurrent transaction can only complete in the absence of write-write conflicts. • Our calculus generalizes snapshot isolation: • We support arbitrary nesting • We allow custom merge functions to resolve write-write conflicts deterministically
Snapshot isolation We can succinctly model snapshot isolation as: • Disallow nesting • Use the default merge: Some versions of snapshot isolation do not treat silent writes in a transaction as a conflict:
Sequential merges • We can view each location as an abstract data types (i.e. object) with certain operations (i.e. methods). • If a merge function always behaves as if concurrent operations for those objects are sequential, we call it a sequential merge. • Such objects always behave as if the operations in the joinee are all done sequentially at the join point.
Sequential merges x = o • A merge is sequential if: merge(uw1(o), uw2(o), u(o))= uw1w2(o) • Anduw1w2(o) u w1 w2 merge(uw1(o), uw2(o), u(o))
Abelian merges • For any abstract data type that forms an abelian group (associative, commutative, with inverses) with neutral element 0 and an operation , the following merge is sequential: merge(v,v’,v0) = v v’ v0 • This holds for example for additive integers and additive sets.
SpaceWars Game Sequential Game Loop: Parallel Collision Detection Render Screen Parallel Collision Detection Parallel Collision Detection Graphics Card Simulate Physics Shared State Play Sounds Send Process Inputs Receive Autosave Key- board Network Connection Disk
Revision Diagram for Parallelized Game Loop Coll. Det. 4 Coll. Det. 1 Coll. Det. 2 Coll. Det. 3 Render network Physics autosave (long running)
“Problem Example 1” is solved Coll. Det. 4 Coll. Det. 1 Coll. Det. 2 Coll. Det. 3 Render network Physics autosave (long running) • Render task reads position of all game objects • Physics task updates position of all game objects • No interference!
“Problem Example 2” is solved. Coll. Det. 4 Coll. Det. 1 Coll. Det. 2 Coll. Det. 3 Render network Physics autosave (long running) • Physics taskupdates position of all game objects • Network task updates position of some objects • Network updates have priority over physics updates • Order of joins establishes precedence!
Results • Autosave now perfectly unnoticeable in background • Overall Speed-Up:3.03x on four-core (almost completely limited by graphics card)
Overhead:How much does all the copying and the indirection cost?
Conclusion Revisions and Isolation Types simplify the parallelization of applications with tasks that • Exhibit conflicting accesses to shared data • Have unpredictable latency • Have unpredictable data access pattern • May perform I/O that can not be rolled back Revisions and Isolation Types are • easy to reason about (determinism, isolation) • have low-enough overhead for many applications
Questions? • daan@microsoft.com • sburckha@microsoft.com • Download available soon on CodePlex