410 likes | 506 Views
Concurrent Revisions: A deterministic concurrency model. Daan Leijen, Alexandro Baldassin, and Sebastian Burckhardt Microsoft Research (OOPSLA 2010). The concurrency elephant. Task/Data Parallel : TPL, X10, Cilk , StreamIt , Cuda , OpenMP , etc.
E N D
Concurrent Revisions:A deterministic concurrency model. Daan Leijen, Alexandro Baldassin, and Sebastian Burckhardt Microsoft Research (OOPSLA 2010)
The concurrency elephant • Task/Data Parallel: TPL, X10, Cilk, StreamIt, Cuda, OpenMP, etc. • Concurrent: Thread, Locks, Promises, Transactions, etc. • Our focus: Concurrent interactive applications with large shared data structures.
Application = Shared Data and Tasks Reader Mutator Reader Shared Data Reader Mutator Example: Office application • Save the document • React to keyboard input by the user • Perform a spellcheck in the background • Exchange updates with remote users
Spacewars! About 15k lines of C# code, using DirectX. The original game is sequential
Examples from SpaceWars Game Example 1:read-write conflict • Render taskreads position of all game objects • Physics task updates position of all game objects => Render task needs to see consistent snapshot Example 2: write-write conflict • Physics task updates position of all game objects • Network task updates position of some objects => Network has priority over physics updates
Conventional Concurrency Control Conflicting tasks can not efficiently execute in parallel. • pessimistic concurrency control (i.e. locks) • use locks to avoid parallelism where there are (real or potential) conflicts • optimistic concurrency control (i.e. TM) • speculate on absence of conflictsrollback if there are real conflicts either way: true conflicts kill parallelism.
Our Proposed Programming Model: Revisions and Isolation Types Revision A logical unit of work that is forked and joined Isolation Type A type which implements automatic copying/merging of versions on write-write conflict • Deterministic Conflict Resolution, never roll-back • No restrictions on tasks (can be long-running, do I/O) • Full concurrent reading and writing of shared data • Clean semantics (see technical report) • Fast and space-efficient runtime implementation
What’s new No isolation: We see either 0 or 1 depending on the schedule fork revision: forks off a private copy of the shared state Isolation types: declares shared data • Isolation: side effects are only visible when the revision is joined. • Deterministic execution! Concurrent Revisions Traditional Task int x = 0; Taskt = fork { x = 1; } assert(x==0 || x==1); join t; assert(x==1); Versioned<int> x = 0; Revisionr = rfork { x = 1; } assert(x==0); joinr; assert(x==1); isolation: Concurrent modifications are not seen by others join revision: waits for the revision to terminate and writes back changes into the main revision
Puzzle time… int x = 0; int y = 0; Taskt = fork { if (x==0) y++; } if (y==0) x++; joint; Hard to read: let’s use a diagram instead…
Sequential consistency int x = 0 int y = 0 if (y==0) x++; if (x==0) y++; ?? assert( (x==0 && y==1) || (x==1 && y==0) || (x==1 && y==1)); What are the possible values for x and y?
Transactional memory int x = 0 int y = 0 atomic { if (y==0) x++; } atomic { if(x==0) y++; } assert( (x==0 && y==1) || (x==1 && y==0)); ??
Concurrent revisions Versioned<int> x = 0 Versioned<int> y = 0 Isolation and x is always 0 Isolation y is always 0 if (y==0) x++; if (x==0) y++; Determinism only 1 possible result assert(x==1 && y==1); ??
Conflict resolution By default, on a write-write conflict (only), the modification in the child revision wins. Versioned<int> x; x = 0 x = 0 x = 0 x = 1 x = 2 x = 1 x = 0 x = 1 assert(x==2) assert(x==0) assert(x==1)
Custom conflict resolution Cumulative<int,(main,join,orig).main + join – orig> x; x = 0 0 x += 1 x += 2 merge(1,2,0) 3 1 2 assert(x==3)
Demo classSample { [Versioned] inti = 0; publicvoid Run() { varr = CurrentRevision.Fork(() => { i += 1; }); i += 2; CurrentRevision.Join(r); Console.WriteLine("i = " + i); } }
Demo: Sandbox Fork a revision without forking an associated task/thread classSandbox { [Versioned] inti = 0; public void Run() { var r = CurrentRevision.Branch("FlakyCode"); try { r.Run(() => { i = 1; throw newException("Oops"); }); CurrentRevision.Merge(r); } catch { CurrentRevision.Abandon(r); } Console.WriteLine("\n i = "+ i); } } Run code in a certain revision Merge changes in a revision into the main one Abandon a revision and don’t merge its changes.
A Software engineering perspective • Transactional memory: • Code centric: put “atomic” in the code • Granularity: • too broad: too many conflicts and no parallel speedup • too small: potential races and incorrect code • Concurrent revisions: • Data centric: put annotations on the data • Granularity: group data that have mutual constraints together, i.e. if (x + y > 0) should hold, then x and y should be versioned together.
Current Implementation: C# library • For each versioned object, maintain multiple copies • Map revision ids to versions • `mostly’ lock-free array • New copies are allocated lazily • Don’t copy on fork… copy on first write after fork • Old copies are released on join • No space leak
SpaceWars Game Sequential Game Loop: Parallel Collision Detection Render Screen Parallel Collision Detection Parallel Collision Detection Graphics Card Simulate Physics Shared State Play Sounds Send Process Inputs Receive Autosave Key- board Network Connection Disk
Revision Diagram for Parallelized Game Loop Coll. Det. 4 Coll. Det. 1 Coll. Det. 2 Coll. Det. 3 Render network Physics autosave (long running)
“Problem Example 1” is solved Coll. Det. 4 Coll. Det. 1 Coll. Det. 2 Coll. Det. 3 Render network Physics autosave (long running) • Render task reads position of all game objects • Physics task updates position of all game objects • No interference!
“Problem Example 2” is solved. Coll. Det. 4 Coll. Det. 1 Coll. Det. 2 Coll. Det. 3 Render network Physics autosave (long running) • Physics taskupdates position of all game objects • Network task updates position of some objects • Network updates have priority over physics updates • Order of joins establishes precedence!
Results Physics task Render Collision detection • Autosave now perfectly unnoticeable in background • Overall Speed-Up: 3.03x on four-core (almost completely limited by graphics card)
Overhead:How much does all the copying and the indirection cost? Only a 5% slowdown in the sequential case Some individual tasks slow down much more (i.e. physics simulation)
Conclusion Revisions and Isolation Types simplify the parallelization of applications with tasks that • Exhibit conflicting accesses to shared data • Have unpredictable latency • Have unpredictable data access pattern • May perform I/O that can not be rolled back Revisions and Isolation Types are • easy to reason about (determinism, isolation) • have low-enough overhead for many applications
Questions? • daan@microsoft.com • sburckha@microsoft.com • External download available soon
Sequential Consistency int x = 0; int y = 0; task t = fork { if (x==0) y++; } if (y==0) x++; join t; assert( (x==0 && y==1) || (x==1 && y==0) || (x==1 && y==1)); Concurrent Revisions Transactional Memory int x = 0; int y = 0; task t = fork { atomic { if (x==0) y++; } } atomic { if (y==0) x++; } join t; versioned<int> x = 0; versioned<int> y = 0; revision r = rfork { if (x==0) y++; } if (y==0) x++; joinr; assert( (x==0 && y==1) || (x==1 && y==0)); assert(x==1 && y==1);
x = 0 0 x += 1 x += 2 2 x += 3 merge(1,2,0) 3 1 2 3 5 merge(3,5,2) 6 assert( x==6 )
By construction, there is no ‘global’ state: just local state for each revision State is simply a (partial) function from a location to a value
Operational Semantics the state is a composition of the root snapshot and local modifications For some revision r, with snapshot and local modifications and an expression context with hole(x.e) v On a join, the writes of the joineer’ take priority over the writes of the current revision: ::’ On a fork, the snapshot of the new revision r’ is the current state: ::
Custom merge: per location (type) No conflict if a location was not written in the joinee On a join, using a merge function. No conflict if a location was unmodified in the current revision, use the value of the joinee Conflict otherwise, use a location/type specific merge function Standard merges:
What is a conflict? Cumulative<int> x = 0 • Merge is only called if: (1) write in child, and (2) modification in main revision: 0 x += 2 2 No conflict (merge function is not called) x += 3 0 2 2 5 No conflict (merge function is not called) assert( x = 5 )
Merging with failure On fail, we just ignore any writes in the joinee
Snapshot isolation • Widely used in databases, for example Oracle and Microsoft SQL • In essence, in snapshot isolation a concurrent transaction can only complete in the absence of write-write conflicts. • Our calculus generalizes snapshot isolation: • We support arbitrary nesting • We allow custom merge functions to resolve write-write conflicts deterministically
Snapshot isolation We can succinctly model snapshot isolation as: • Disallow nesting • Use the default merge: Some versions of snapshot isolation do not treat silent writes in a transaction as a conflict:
Sequential merges • We can view each location as an abstract data types (i.e. object) with certain operations (i.e. methods). • If a merge function always behaves as if concurrent operations for those objects are sequential, we call it a sequential merge. • Such objects always behave as if the operations in the joinee are all done sequentially at the join point.
Sequential merges x = o • A merge is sequential if: merge(uw1(o), uw2(o), u(o))= uw1w2(o) • Anduw1w2(o) u w1 w2 merge(uw1(o), uw2(o), u(o))
Abelian merges • For any abstract data type that forms an abelian group (associative, commutative, with inverses) with neutral element 0 and an operation , the following merge is sequential: merge(v,v’,v0) = v v’ v0 • This holds for example for additive integers and additive sets.