230 likes | 342 Views
FT07. The State of Parallel Programming . Burton Smith Technical Fellow Microsoft Corporation. Parallel Computing is Now Mainstream. Cores are reaching performance limits More transistors per core just makes it hot New processors are multi-core and maybe multithreaded as well
E N D
FT07 The State of Parallel Programming Burton Smith Technical Fellow Microsoft Corporation
Parallel Computing is Now Mainstream • Cores are reaching performance limits • More transistors per core just makes it hot • New processors are multi-core • and maybe multithreaded as well • Uniform shared memory within a socket • Multi-socket may be pretty non-uniform • Logic cost ($ per gate-Hz) keeps falling • New “killer apps” will doubtless need more performance • How should we write parallel programs?
Parallel Programming Practice Today • Threads and locks • SPMD languages • OpenMP • Co-array Fortran, UPC, and Titanium • Message passing languages • MPI, Erlang • Data-parallel languages • Cuda, OpenCL • Most of these are pretty low level
Higher Level Parallel Languages • Allow higher level data-parallel operations • E.g. programmer-defined reductions and scans • Exploit architectural support for parallelism • SIMD instructions, inexpensive synchronization • Provide for abstract specification of locality • Present a transparent performance model • Make data races impossible For the last item, something must be done about unrestricted use of variables
Shared Memory is not the Problem • Shared memory has some benefits: • Forms a delivery vehicle for high bandwidth • Permits unpredictable data-dependent sharing • Provides a large synchronization namespace • Facilitates high level language implementations • Language implementers like it as a target • Non-uniform memory can even scale • But shared variables are an issue: stores do not commute with other loads or stores • Shared memory isn’t a programming model
Pure Functional Languages • Imperative languages do computations by scheduling values into variables • Their parallel dialects are prone to data races • There are far too many parallel schedules • Pure functional languages avoid data races simply by avoiding variables entirely • They compute new constants from old • Loads commute so data races can’t happen • Dead constants can be reclaimed efficiently • But no variables implies no mutable state
Mutable State is Crucial for Efficiency • To let data structures inexpensively evolve • To avoid always copying nearly all of them • Monads were added to pure functional languages to allow mutable state (and I/O) • Plain monadic updates may still have data races • The problem is maintaining state invariants • These are just a program’s “conservation laws” • They describe the legal attributes of the state • As with physics, they are associated with a certain generalized type of commutativity
Maintaining Invariants • Updates perturb, then restore an invariant • Program composability depends on this • It’s automatic for us once we learn to program • How can we maintain invariants in parallel? • Two requirements must be met: • Updates must not interfere with each other • That is, they must be isolatedin some fashion • Updates must finish once they start • …lest the next update see the invariant false • We say the state updates must be atomic • Updates that are both isolated and atomic are called transactions
Commutativity and Non-Determinism • If p and q preserve invariant I and do not interfere, their parallel execution { p || q } also preserves I† • If p and q are performed in isolation and atomically, i.e. astransactions, then they will not interfere‡ • Operations may not commute with respect to state • But we always get commutativity with respect to the invariant • This leads to a weaker form of determinism • Long ago some of us called it “good non-determinism” • It’s the non-determinism operating systems rely on †Susan Owicki and David Gries. Verifying properties of parallel programs: An axiomatic approach. CACM 19(5), pp. 279−285, May 1976. ‡Leslie Lamport and Fred Schneider. The “Hoare Logic” of CSP, And All That. ACM TOPLAS 6(2), pp. 281−296, April 1984.
Example: Hash Tables • Hash tables implement sets of items • The key invariant is that an item is in the set iff its insertion followed all removals • There are also storage structure invariants, e.g. hash buckets must be well-formed linked lists • Parallel insertions and removals need only maintain the logical AND of these invariants • This may not result in deterministic state • The order of items in a bucket is unspecified
High Level Data Races • Some loads and stores can be isolated and atomic but cover only a part of the invariant • E.g. copying data from one structure to another • If atomicity is violated, the data can be lost • Another example is isolating a graph node while deleting it but then decrementing neighbors’ reference counts with LOCK DEC • Some of the neighbors may no longer exist • It is challenging to see how to automate data race detection for examples like these
Other Examples • Data bases and operating systems commonly mutate state in parallel • Data bases use transactions to achieve consistency via atomicity and isolation • SQL programming is pretty simple • SQL is arguably not general-purpose • Operating systems use locks for isolation • Atomicity is left to the OS developer • Lock ordering is used to prevent deadlock • A general purpose parallel language should easily handle applications like these
Implementing Isolation • Analysis • Proving concurrent state updates are isolated • Locking • Deadlock must be handled somehow • Buffering • Often used for wait-free updates • Partitioning • Partitions can be dynamic, e.g. as in quicksort • Serializing • These schemes can be nested
Isolation in Existing Languages • Static in space: MPI, Erlang • Dynamic in space: Refined C, Jade • Static in time: Serial execution • Dynamic in time: Single global lock • Static in both: Dependence analysis • Semi-static in both: Inspector-executor • Dynamic in both: Multiple locks
Atomicity • Atomicity means “all or nothing” execution • State changes must be all done or undone • Isolation without atomicity has little value • But atomicity is vital even in the serial case • Implementation techniques: • Compensating, i.e. reversing a computation “in place” • Logging, i.e. remembering and restoring the original state values • Atomicity is challenging for distributed computing and I/O
Exceptions • Exceptions can threaten atomicity • An aborted state update must be undone • What if a state update depends on querying a remote service and the query fails? • The message from the remote service should send exception information in lieu of the data • Message arrival can then throw as usual and the partial update can be undone
Transactional Memory • “Transactional memory” means transaction semantics within lexically scoped blocks • TM has been a hot topic of late • As usual, lexical scope seems a virtue here • Adding TM to existing languages has problems • There is a lot of optimization work to do • to make atomicity and isolation highly efficient • Meanwhile, we shouldn’t ignore traditional ways to get transactional semantics
Whence Invariants? • Can we generate invariants from code? • Only sometimes, and it is difficult even then • Can we generate code from invariants? • Is this the same as intentional programming? • Can we write invariants plus code and let the compiler check invariant preservation? • This is much easier, but may be less attractive • Can languages make it more likely that a transaction covers the invariant’s domain? • E.g. leveraging objects with encapsulated state • Can we at least debug our mistakes?
Conclusions • Functional languages with transactions enable higher level parallel programming • Microsoft is heading in this general direction • Efficient implementations of isolation and atomicity are important • We trust architecture will ultimately help support these things • The von Neumann model needs replacing, and soon
YOUR FEEDBACK IS IMPORTANT TO US! Please fill out session evaluation forms online at MicrosoftPDC.com
Learn More On Channel 9 • Expand your PDC experience through Channel 9. • Explore videos, hands-on labs, sample code and demos through the new Channel 9 training courses. channel9.msdn.com/learn Built by Developers for Developers….