310 likes | 500 Views
The Future of Parallel Programming in Visual Studio . Stephen Toub Parallel Computing Platform Microsoft Corporation. DISCLAIMER. This is a talk about the future… All content is very subject to change Some of the technology being discussed… …is available in CTP form now
E N D
The Future of Parallel Programming in Visual Studio Stephen Toub Parallel Computing Platform Microsoft Corporation
DISCLAIMER • This is a talk about the future… • All content is very subject to change • Some of the technology being discussed… • …is available in CTP form now • …will likely be available in CTP form soon • …may be available in CTP form some time in the future • …may never actually ship (but hopefully it all will)
Agenda • Present • Visual Studio 2010 Recap • Future • Visual Studio • .NET • C++
Visual Studio 2010Tools, Programming Models, Runtimes Tools Programming Models Parallel LINQ • Concurrency Visualizer F# Parallel Pattern Library Async AgentsLibrary Task Parallel Library Parallel Tasks Data Structures Data Structures Concurrency Runtime Task Scheduler ThreadPool Parallel Stacks Task Scheduler Resource Manager Resource Manager Operating System Windows Key: Managed Native Tooling Platform
Future Tools, Programming Models, Runtimes “DryadLINQ” Programming Models Tools • Concurrency Visualizer C#, VB, & F# Parallel LINQ TPL Dataflow Async AgentsLibrary Data Parallel Extensions to C++ Parallel Pattern Library • MPI /GPU Profiler Task Parallel Library MPI Debugger Data Structures Data Structures • GPU Debugger Runtime Task Scheduler ThreadPool Parallel Watch DirectX • HPC Server Task Scheduler Parallel Tasks Resource Manager Resource Manager Parallel Stacks Operating System Windows Platform Key: Managed Native Tooling
Future Tools, Programming Models, Runtimes “DryadLINQ” Programming Models Tools • Concurrency Visualizer C#, VB, & F# Parallel LINQ TPL Dataflow Async AgentsLibrary Data Parallel Extensions to C++ Parallel Pattern Library • MPI /GPU Profiler Task Parallel Library MPI Debugger Data Structures Data Structures • GPU Debugger Runtime Task Scheduler ThreadPool Parallel Watch DirectX • HPC Server Task Scheduler Parallel Tasks Resource Manager Resource Manager Parallel Stacks Operating System Windows Platform Key: Managed Native Tooling
This is your synchronous code with .NET 4… public voidCopyStreamToStream(Stream source, Stream destination){byte[] buffer = newbyte[0x1000];intnumRead;while ((numRead= source.Read(buffer, 0, buffer.Length)) != 0) {destination.Write(buffer, 0, numRead);}}
This is an expert’s asynchronous code with .NET 4… publicIAsyncResultBeginCopyStreamToStream( Stream source, Stream destination){vartcs = newTaskCompletionSource<object>();byte[] buffer = newbyte[0x1000]; Action<IAsyncResult> readWriteLoop = null;readWriteLoop = iar => {try {for (boolisRead = iar == null; ; isRead = !isRead) {switch (isRead) {casetrue:iar = source.BeginRead(buffer, 0, buffer.Length, readResult => {if (readResult.CompletedSynchronously) return;readWriteLoop(readResult); }, null);if (!iar.CompletedSynchronously) return;break;casefalse:intnumRead = source.EndRead(iar);if (numRead == 0) {tcs.TrySetResult(null);return; }iar = destination.BeginWrite(buffer, 0, numRead, writeResult => {if (writeResult.CompletedSynchronously) return;destination.EndWrite(writeResult);readWriteLoop(null); }, null);if (!iar.CompletedSynchronously) return;destination.EndWrite(iar);break; } } }catch (Exception e) { tcs.TrySetException(e); } };readWriteLoop(null); returntcs.Task;} publicvoidEndCopyStreamToStream(IAsyncResultasyncResult){ ((Task)asyncResult).Wait();} public voidCopyStreamToStream(Stream source, Stream destination){byte[] buffer = newbyte[0x1000];intnumRead;while ((numRead= source.Read(buffer, 0, buffer.Length)) != 0) {destination.Write(buffer, 0, numRead);}}
This is your asynchronous code with the Visual Studio Async CTP… public voidCopyStreamToStream(Stream source, Stream destination){byte[] buffer = newbyte[0x1000];intnumRead;while ((numRead= source.Read(buffer, 0, buffer.Length)) != 0) {destination.Write(buffer, 0, numRead);}} public asyncTaskCopyStreamToStreamAsync(Stream source, Stream destination){byte[] buffer = newbyte[0x1000];intnumRead;while ((numRead= awaitsource.ReadAsync(buffer, 0, buffer.Length)) != 0) {awaitdestination.WriteAsync(buffer, 0, numRead);}} public asyncTaskCopyStreamToStreamAsync(Stream source, Stream destination){byte[] buffer = newbyte[0x1000];intnumRead;while ((numRead= await source.ReadAsync(buffer, 0, buffer.Length)) != 0) { await destination.WriteAsync(buffer, 0, numRead);}}
Asynchronous Methods“Just Like Synchronous Programming” • Language • “async” modifier: marks method or lambda as asynchronous • “await” operator: yields control until awaited task completes • Framework • Use Task and Task<T> to represent “ongoing operations” • Could be async I/O, background work, anything... • Single object for status, result, and exceptions • Composable callback model • “await” rewrites to use continuations
Responsive User Interfaces Asian Option Pricing in Excel with Async and PLINQ demo http://msdn.com/vstudio/async
Related Additions • Combinators • Task.WhenAll, Task.WhenAny, … • Timer integration • Task.Delay(ms), new CancellationTokenSource(ms) • Task Scheduling • ConcurrentExclusiveSchedulerPair • Fine-Grained Control • TCO.DenyChildAttach, EnumerablePartitionerOptions
TPL Dataflow • Primitives for in-process message passing • Think “buffering + processing” • Built on Tasks, concurrent collections, … • “Dataflow blocks” can be linked together to create networks • Based on concepts / designs from • Decades of computer science research / history • Related Microsoft implementations • Asynchronous Agents library in Visual C++ 2010 • CCR from Microsoft Robotics
Example: Async Posting ActionBlock<int> varc = new ActionBlock<int>(i => { Process(i); }); for(int i=0; i<5; i++) { c.Post(i); } Message Queue Process(4); Process(0); Process(1); Process(2); Process(3); 4 0 1 2 3
Asynchronous Agents Real Estate Simulation with Async and TPL Dataflow demo http://msdn.com/vstudio/async
“DryadLINQ” • PLINQ • “Just add .AsParallel()” • “DryadLINQ” • “Just add .AsDistributed()” • Enable developers to scale out to process large, partitioned data and computations • Using existing tools and skills • Built on a runtime that provides • Support for Windows HPC Server and (eventually) Azure • Robustness and fault tolerance • Runtime-level optimizations behind the scenes
LINQ Query • varbabies = (IEnumerable<BabyInfo>)...; • varresults = from baby in babies • where baby.Name == queryName && • baby.State == GetState(stateCode) && • baby.Year >= yearStart && • baby.Year <= yearEnd • orderbybaby.Yearascending • select baby; • foreach(var result in results) • { • ... • }
Parallel LINQ Query • varbabies = ... .AsParallel(); • varresults = from baby in babies • where baby.Name == queryName && • baby.State == GetState(stateCode) && • baby.Year >= yearStart && • baby.Year <= yearEnd • orderbybaby.Yearascending • select baby; • foreach(var result in results) • { • ... • }
DryadLINQ*Query With Local Data • varbabies = ... .AsDistributed(); • varresults = from baby in babies • where baby.Name == queryName && • baby.State == GetState(stateCode) && • baby.Year >= yearStart && • baby.Year <= yearEnd • orderbybaby.Yearascending • select baby; • foreach(var result in results) • { • ... • }
DryadLINQ*Query With Partitioned Data • varbabies = DistributedData.Open<BabyInfo>("hpcdsc://MyCluster/BabyInfo"); • varresults = from baby in babies • where baby.Name == queryName && • baby.State == GetState(stateCode) && • baby.Year >= yearStart && • baby.Year <= yearEnd • orderbybaby.Yearascending • select baby; • await results.ExecuteAsync("hpcdsc://MyCluster/MyBabiesResults"); Research download available from: research.microsoft.com/projects/dryadLINQ/
Parallel Pattern LibraryNew and Improved • More container templates • concurrent_unordered_map • concurrent_unordered_multimap • concurrent_unordered_set • concurrent_unordered_multiset • + R-value support • More algorithm templates • parallel_transform • parallel_reduce • parallel_sort • parallel_radixsort • parallel_buffered_sort • First Class Tasks intn = 40; task<int> t([=]() { return fib(n); } … // do other work, come back later printf("%d\n", t.get()); task_completion_event<int> tce; task<int> t(tce); foo(tce); // eventually sets tce … // do other work, come back later printf("%d\n", t.get()); task<int> t([]() { return fib(40); } t.continue_with([](int n) { printf("%d\n", t.get()); } autoresult = when_any(t1,t2,t3).continue_with ([&](string result) -> int { printf("Got one element '%s'\n", result.c_str()); return g(); }); result.wait(); autotaskResult= when_all(t1,t2,t3).continue_with( [&](vector<string> results) -> int { printf("Got all %d elements\n", results.size()); for each(auto result inresults) printf("'%s'\n",result.c_str()); returnf(); }); result.wait();
Parallel Sorting with C++ parallel_sort, parallel_buffered_sort, parallel_radixsort demo http://code.msdn.microsoft.com /ConcRTExtras
Data Parallel and GPU Acceleration • Opportunity for extreme performance for certain kinds of computation • But… • Difficult to program today • Often hardware vendor specific Ray tracing Computed Tomography GPU CPU Over 32x increase > 1 TFlops at 5% CPU
Data Parallel Extensions to C++ • Small set of extensions to C++ • Data parallel algorithms • Subset of C++ language used within kernel functions • C++ abstraction features and compilation model • Type checking • Separate compilation units supported • Built on DirectX 11 • Device portability through DX • Integrated debugging and profiling tools
Data Parallel ProgrammingExample: C = A + B 0 1 2 3 … n A CPU GPU 0 1 2 3 … n B ALU ALU Control 0 2 4 6 … 2n ALU ALU C Cache DRAM DRAM float* A = newfloat[size]; float* B = newfloat[size]; float* C = newfloat[size]; ... for(int i = 0; i < n; i++ ) { C[i] = A[i] + B[i]; } float* A = new float[size]; float* B = new float[size]; float* C = new float[size]; ... parallel_for(0, n, [&](int i) { C[i] = A[i] + B[i]; }); float* A = new float[size]; float* B = new float[size]; float* C = new float[size]; ... auto rv = get_default_resource_view(); extent<1> e(size) array<1,float> fA(e, A, rv); array<1,float> fB(e, B, rv); array<1,float> fC(e, C, rv); forall([=](index<1> i) { fC[i] = fA[i] + fB[i]; }, fC, fA, fB); fC.copy_out(C, size); release_resource_view(rv); Where Shape Device Copy What
Summary • Visual Studio 2010 • A great environment for writing parallel applications • The Future is very bright • Visual Studio • Parallel Watch, cluster and GPU profiling and debugging • .NET • Async, TPL Dataflow, “DryadLINQ”, … • C++ • PPL vNext, Data Parallel Extensions to C++, ... • … // stay tuned
Session Evaluations Tell us what you think, and you could win! All evaluations submitted are automatically entered into a daily prize draw* Sign-in to the Schedule Builder at http://europe.msteched.com/topic/list/ * Details of prize draw rules can be obtained from the Information Desk.
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.