1 / 13

Many-Core Software

Many-Core Software. Burton Smith Microsoft. Computing is at a Crossroads. Continual performance improvement is our field’s lifeblood It encourages people to buy new hardware It opens up new software possibilities Single-thread performance is nearing the end of the line

kalea
Download Presentation

Many-Core Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Many-Core Software Burton SmithMicrosoft

  2. Computing is at a Crossroads • Continual performance improvement is our field’s lifeblood • It encourages people to buy new hardware • It opens up new software possibilities • Single-thread performance is nearing the end of the line • But Moore’s Law will continue for some time to come • What can we do with all those transistors? • Computation needs to become as parallel as possible • Henceforth, serial means slow • Systems must support general purpose parallel computing • The alternative is commoditization • New many-core chips will need new software • Our programming models will have to change • The von Neumann premise is broken

  3. The von Neumann Premise • Simply put, “instruction instances are totally ordered” • This notion has created artifacts: • Variables • Interrupts • Demand paging • And caused major problems: • The ILP wall • The power wall • The memory wall • What software changes will we need for many-core? • New languages? • New approaches for compilers, runtimes, tools? • New (or perhaps old) operating system ideas?

  4. Do We Really Need New Languages? • Mainstream languages schedule values into variables • To orchestrate the flow of values in the program • To incrementally but consistently update state • Introducing parallelism exposes weaknesses in: • Passing values between unordered instructions • Updating state consistently • Our “adhesive bandage” attempts have proven insufficient • Not general enough • Not productive enough • So my answer is “Absolutely!”

  5. Parallel Programming Languages • There are (at least) two promising approaches: • Functional programming • Atomic memory transactions • Neither is completely satisfactory by itself • Functional programs don’t allow mutable state • Transactional programs implement data flows awkwardly • Data base applications show synergy of these two ideas • SQL is a “mostly functional” language • Transactions allow Consistency via Atomicity and Isolation • Many people think functional languages must be inefficient • Sisal and NESL are excellent counterexamples • Both competed strongly with Fortran on Cray systems • Others think memory transactions must be inefficient also • This remains to be seen; we have only just begun to optimize

  6. Transactions and Invariants • Invariants are a program’s conservation laws • Relationships among values in iteration and recursion • Rules of data structure (state) integrity • If statements p and q preserve the invariant I and they do not “interfere”, their parallel composition { p || q } also preserves I† • If p and q are performed atomically, i.e. as transactions, then they will not interfere‡ • Although operations seldom commute with respect to state, transactions give us commutativity with respect to the invariant • It would help if the invariants were available to the compiler • Can we ask programmers to supply them? † Susan Owicki and David Gries. Verifying properties of parallel programs: An axiomatic approach. CACM 19(5):279−285, May 1976. ‡ Leslie Lamport and Fred Schneider. The “Hoare Logic” of CSP, And All That. ACM TOPLAS 6(2):281−296, Apr. 1984.

  7. Styles of Parallelism • We probably need to support multiple programming styles • Both functional and transactional • Both data parallel and task parallel • Both message passing and shared memory • Both declarative and imperative • Both implicit and explicit • We may need several languages to accomplish this • After all, we do use multiple languages today • Language interoperability (e.g. .NET) will help greatly • It is essential that parallelism be exposed to the compiler • So that the compiler can adapt it to the target system • It is also essential that locality be exposed to the compiler • For the same reason

  8. Compiler Optimization for Parallelism • Some say automatic parallelization is a demonstrated failure • Vectorizing and parallelizing compilers (especially for the right architecture) have been a tremendous success • They have enabled machine-independent languages • What they do can be termed parallelism packaging • Even manifestly parallel programs need it • What failed is parallelism discovery, especially in-the-large • Dependence analysis is chiefly a local success • Locality discovery in-the-large has also been a non-starter • Locality analysis is another word for dependence analysis • The jury is still out on in-the-large locality packaging • Local locality packaging works pretty well

  9. Fine-grain Parallelism • Exploitable parallelism grows as task granularity shrinks • But dependences among tasks become more numerous • Inter-task dependence enforcement demands scheduling • A task needing a value from elsewhere must wait for it • User-level work scheduling is needed • No privilege change to stop or restart a task • Locality (e.g. cache content) can be better preserved • Todays OSes and hardware don’t encourage waiting • OS thread preemption makes blocking dangerous • Instruction sets encourage non-blocking approaches • Busy-waiting wastes instruction issue opportunities • We need better support for blocking synchronization • In both instruction set and operating system

  10. Resource Management Consequences • Since the user runtime is scheduling work on processors, the OS should not attempt to do the same • An asynchronous OS API is a necessary corollary • The user-exposed API should be synchronous • Scheduling memory via demand paging is also problematic • Instead, the application and OS should negotiate • The application tells the OS its resource needs & desires • The OS makes decisions based on the big picture: • Requirements for quality of service • Availability of resources • Appropriateness of power level • The OS can preempt resources to reclaim them • But with notification, so the application can rearrange work • Resources should be time- and space-shared in chunks

  11. Bin Packing • The more resources allocated, the more swapping overhead • It would be nice to amortize it • The more resources you get, the longer you may keep them • Roughly, this means scheduling = packing squarish blocks • QOS applications might need long rectangles instead • When the blocks don’t fit, the OS can morph them a little • Or cut corners when absolutely necessary Quantity of resource Time

  12. Parallel Debugging and Tuning • Today, debugging relies on single-stepping and printf() • Single-stepping a parallel program is a bit less effective • Conditional program and data breakpoints are helpful • To stop when an invariant fails to be true • Support for ad-hoc data perusal is also very important • Debugging is data mining • Serial program tuning tries to discover where the program counter spends its time • The answer is usually found by sampling the PC • In contrast, parallel program tuning tries to discover where there is insufficient parallelism • A good way is to log perf counters and a timestamp at events • Visualization is a big deal for both debugging and tuning

  13. Conclusions • It is time to rethink some of the basics • There is lots of work for everyone to do • I’ve left out lots of things, e.g. applications • We need basic research as well as industrial development • Research in computer systems is deprecated these days • In the USA, NSF and DOD need to take the initiative

More Related