1 / 23

The State of Parallel Programming

FT07. The State of Parallel Programming . Burton Smith Technical Fellow Microsoft Corporation. Parallel Computing is Now Mainstream. Cores are reaching performance limits More transistors per core just makes it hot New processors are multi-core and maybe multithreaded as well

angie
Download Presentation

The State of Parallel Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FT07 The State of Parallel Programming Burton Smith Technical Fellow Microsoft Corporation

  2. Parallel Computing is Now Mainstream • Cores are reaching performance limits • More transistors per core just makes it hot • New processors are multi-core • and maybe multithreaded as well • Uniform shared memory within a socket • Multi-socket may be pretty non-uniform • Logic cost ($ per gate-Hz) keeps falling • New “killer apps” will doubtless need more performance • How should we write parallel programs?

  3. Parallel Programming Practice Today • Threads and locks • SPMD languages • OpenMP • Co-array Fortran, UPC, and Titanium • Message passing languages • MPI, Erlang • Data-parallel languages • Cuda, OpenCL • Most of these are pretty low level

  4. Higher Level Parallel Languages • Allow higher level data-parallel operations • E.g. programmer-defined reductions and scans • Exploit architectural support for parallelism • SIMD instructions, inexpensive synchronization • Provide for abstract specification of locality • Present a transparent performance model • Make data races impossible For the last item, something must be done about unrestricted use of variables

  5. Shared Memory is not the Problem • Shared memory has some benefits: • Forms a delivery vehicle for high bandwidth • Permits unpredictable data-dependent sharing • Provides a large synchronization namespace • Facilitates high level language implementations • Language implementers like it as a target • Non-uniform memory can even scale • But shared variables are an issue: stores do not commute with other loads or stores • Shared memory isn’t a programming model

  6. Pure Functional Languages • Imperative languages do computations by scheduling values into variables • Their parallel dialects are prone to data races • There are far too many parallel schedules • Pure functional languages avoid data races simply by avoiding variables entirely • They compute new constants from old • Loads commute so data races can’t happen • Dead constants can be reclaimed efficiently • But no variables implies no mutable state

  7. Mutable State is Crucial for Efficiency • To let data structures inexpensively evolve • To avoid always copying nearly all of them • Monads were added to pure functional languages to allow mutable state (and I/O) • Plain monadic updates may still have data races • The problem is maintaining state invariants • These are just a program’s “conservation laws” • They describe the legal attributes of the state • As with physics, they are associated with a certain generalized type of commutativity

  8. Maintaining Invariants • Updates perturb, then restore an invariant • Program composability depends on this • It’s automatic for us once we learn to program • How can we maintain invariants in parallel? • Two requirements must be met: • Updates must not interfere with each other • That is, they must be isolatedin some fashion • Updates must finish once they start • …lest the next update see the invariant false • We say the state updates must be atomic • Updates that are both isolated and atomic are called transactions

  9. Commutativity and Non-Determinism • If p and q preserve invariant I and do not interfere, their parallel execution { p || q } also preserves I† • If p and q are performed in isolation and atomically, i.e. astransactions, then they will not interfere‡ • Operations may not commute with respect to state • But we always get commutativity with respect to the invariant • This leads to a weaker form of determinism • Long ago some of us called it “good non-determinism” • It’s the non-determinism operating systems rely on †Susan Owicki and David Gries. Verifying properties of parallel programs: An axiomatic approach. CACM 19(5), pp. 279−285, May 1976. ‡Leslie Lamport and Fred Schneider. The “Hoare Logic” of CSP, And All That. ACM TOPLAS 6(2), pp. 281−296, April 1984.

  10. Example: Hash Tables • Hash tables implement sets of items • The key invariant is that an item is in the set iff its insertion followed all removals • There are also storage structure invariants, e.g. hash buckets must be well-formed linked lists • Parallel insertions and removals need only maintain the logical AND of these invariants • This may not result in deterministic state • The order of items in a bucket is unspecified

  11. High Level Data Races • Some loads and stores can be isolated and atomic but cover only a part of the invariant • E.g. copying data from one structure to another • If atomicity is violated, the data can be lost • Another example is isolating a graph node while deleting it but then decrementing neighbors’ reference counts with LOCK DEC • Some of the neighbors may no longer exist • It is challenging to see how to automate data race detection for examples like these

  12. Other Examples • Data bases and operating systems commonly mutate state in parallel • Data bases use transactions to achieve consistency via atomicity and isolation • SQL programming is pretty simple • SQL is arguably not general-purpose • Operating systems use locks for isolation • Atomicity is left to the OS developer • Lock ordering is used to prevent deadlock • A general purpose parallel language should easily handle applications like these

  13. Implementing Isolation • Analysis • Proving concurrent state updates are isolated • Locking • Deadlock must be handled somehow • Buffering • Often used for wait-free updates • Partitioning • Partitions can be dynamic, e.g. as in quicksort • Serializing • These schemes can be nested

  14. Isolation in Existing Languages • Static in space: MPI, Erlang • Dynamic in space: Refined C, Jade • Static in time: Serial execution • Dynamic in time: Single global lock • Static in both: Dependence analysis • Semi-static in both: Inspector-executor • Dynamic in both: Multiple locks

  15. Atomicity • Atomicity means “all or nothing” execution • State changes must be all done or undone • Isolation without atomicity has little value • But atomicity is vital even in the serial case • Implementation techniques: • Compensating, i.e. reversing a computation “in place” • Logging, i.e. remembering and restoring the original state values • Atomicity is challenging for distributed computing and I/O

  16. Exceptions • Exceptions can threaten atomicity • An aborted state update must be undone • What if a state update depends on querying a remote service and the query fails? • The message from the remote service should send exception information in lieu of the data • Message arrival can then throw as usual and the partial update can be undone

  17. Transactional Memory • “Transactional memory” means transaction semantics within lexically scoped blocks • TM has been a hot topic of late • As usual, lexical scope seems a virtue here • Adding TM to existing languages has problems • There is a lot of optimization work to do • to make atomicity and isolation highly efficient • Meanwhile, we shouldn’t ignore traditional ways to get transactional semantics

  18. Whence Invariants? • Can we generate invariants from code? • Only sometimes, and it is difficult even then • Can we generate code from invariants? • Is this the same as intentional programming? • Can we write invariants plus code and let the compiler check invariant preservation? • This is much easier, but may be less attractive • Can languages make it more likely that a transaction covers the invariant’s domain? • E.g. leveraging objects with encapsulated state • Can we at least debug our mistakes?

  19. Conclusions • Functional languages with transactions enable higher level parallel programming • Microsoft is heading in this general direction • Efficient implementations of isolation and atomicity are important • We trust architecture will ultimately help support these things • The von Neumann model needs replacing, and soon

  20. YOUR FEEDBACK IS IMPORTANT TO US! Please fill out session evaluation forms online at MicrosoftPDC.com

  21. Learn More On Channel 9 • Expand your PDC experience through Channel 9. • Explore videos, hands-on labs, sample code and demos through the new Channel 9 training courses. channel9.msdn.com/learn Built by Developers for Developers….

More Related