240 likes | 352 Views
Overview Let’s get started!. Outline Quick Introduction PLINQ Hands-On Performance Tips Prerequisites .NET and C# LINQ Threading Basics. Multi-Core and .NET 4 In the words of developers.
E N D
OverviewLet’s get started! • Outline • Quick Introduction • PLINQ Hands-On • Performance Tips • Prerequisites • .NET and C# • LINQ • Threading Basics
Multi-Core and .NET 4In the words of developers • “Getting an hour-long computation done in 10 minutes changes how we work.”- Carl Kadie, Microsoft’s eScience Research Group • “.NET 4 has made it practical and cost-effective to implement parallelism where it may have been hard to justify in the past.“- Kieran Mockford, MSBuild • “I do believe the .NET Framework 4 will change the way developers think about parallel programming.“- Gastón C. Hillar, independent IT consultant and freelance author
Visual Studio 2010Tools, programming models and runtimes Tools Programming Models Visual Studio IDE Visual C++ 10 .NET Framework 4 Parallel LINQ Parallel LINQ Parallel Debugger Tool Windows Parallel Pattern Library AgentsLibrary Task Parallel Library Data Structures Data Structures Concurrency Runtime • Concurrency Visualizer Task Scheduler ThreadPool Task Scheduler Resource Manager Resource Manager Operating System Windows Threads UMS Threads Key: Managed Native Tooling
From LINQ to Objects to PLINQAn easy change • LINQto Objects query: • int[] output = arr • .Select(x => Foo(x)) • .ToArray(); • PLINQquery: • int[] output = arr.AsParallel() • .Select(x => Foo(x)) • .ToArray();
PLINQ hands-on coding walkthrough
Array Mapping • int[] input = ... • bool[] output = input.AsParallel() .Select(x => IsPrime(x)) .ToArray(); input: output: 1 Thread 1 F Array to array mapping is simple and efficient. Select 6 F 3 Thread 2 T Select 8 F … … 2 Thread N T Select 7 T
Sequence Mapping • IEnumerable<int> input = Enumerable.Range(1,100); • bool[] output = input.AsParallel() .Select(x => IsPrime(x)) .ToArray(); Buffers are combined into one array. Thread 1 Select Results 1 Input Enumerator output: Thread 2 Lock Select Results 2 Each thread processes a partition of inputs and stores results into a buffer. ... Thread N Results N Select
Asynchronous Mapping • var q = input.AsParallel() • .Select(x => IsPrime(x)); • foreach(var x in q) { ... } Thread 1 Results 1 Select Output Enumerator Poll MoveNext Thread 2 Lock Select Results 2 Input Enumerator Main Thread foreach ... In this query, the foreach loop starts consuming results as they are getting computed. Thread N Results N Select
Async Ordered Mapping or Filter • var q = input.AsParallel().AsOrdered() • .Select(x => IsPrime(x)); • foreach(var x in q) { ... } Thread 1 Results 1 Op Output Enumerator Poll Ordering Buffer Thread 2 Lock Op Results 2 Input Enumerator MoveNext When ordering is turned on, PLINQ orders elements in a reordering buffer before yielding them to the foreach loop. ... Main Thread foreach Thread N Results N Op
Aggregation • int result = input.AsParallel() • .Aggregate( • 0, • (a, e) => a + Foo(e), • (a1,a2) => a1 + a2); res1: Thread 1 Aggregate Input Enumerator res2: result: Thread 2 Lock Aggregate Each thread computes a local result. ... The local results are combined into a final result. resN: Thread N Aggregate
Search • int result = • input.AsParallel().AsOrdered() • .Where(x => IsPrime(x)) • .First(); Thread 1 First resultFound: result: Thread 2 Lock First Input Enumerator F Poll ... Thread N First Set
More complex query • int[] output = input.AsParallel() .Where(x => IsPrime(x)) • .GroupBy(x => x % 5) • .Select(g => ProcessGroup(g)) .ToArray(); Thread 1 Where Thread 1 output: Select Groups1 Results1 GroupBy Input Enumerator Thread 2 Where Thread 2 Lock Select Groups 2 Results2 GroupBy ... ...
Performance Tip #1:Avoid memory allocations • When the delegate allocates memory • GC and memory allocations can become the bottleneck • Then, your algorithm is only as scalable as GC • Mitigations: • Reduce memory allocations • Turn on server GC
Performance Tip #2:Avoid true and false sharing • Modern CPUs exploit locality • Recently accessed memory locations are stored in a fast cache • Multiple cores • Each core has its own cache • When a memory location is modified, it is invalidated in all caches • In fact, the entire cache line is invalidated • A cache line is usually 64 or 128 bytes
Performance Tip #2:Avoid True and False Sharing Core 1 Thread 1 Cache Memory: 5 2 6 7 3 5 7 3 2 Core 2 Thread 2 Cache Cache line 5 2 7 3 Invalidate Core 3 Thread 3 Cache If cores continue stomping on each other’s caches, most reads and writes will go to the main memory! 5 2 7 3 Core 4 Thread 4 Cache 5 2 7 3
Performance Tip #3:Use expensive delegates • Computationally expensive delegate is the best case for PLINQ • Cheap delegate over a long sequence may also scale, but: • Overheads reduce the benefit of scaling • MoveNext and Current virtual method calls on enumerator • Virtual method calls to execute delegates • Reading a long input sequence may be limited by the memory throughput
Performance Tip #4:Write simple PLINQ queries • PLINQ can execute all LINQ queries • Simple queries are easier to reason about • Break up complex queries so that only the expensive data-parallel part is in PLINQ: • src.Select(x => Foo(x)) • .TakeWhile(x => Filter(x)) • .AsParallel() • .Select(x => Bar(x)) • .ToArray();
Performance Tip #5:Choose appropriate partitioning • Partitioning algorithms vary in: • Overhead • Load-balancing • The required input representation • By default: • Array, IList<> are partitioned statically • Other IEnumerable<> types are partitioned on demand in chunks • Custom partitioning supported via Partitioner
Performance Tip #6Use PLINQ with thought and care • Measure, measure, measure! • Find the bottleneck in your code • If the bottleneck fits a data-parallel pattern, try PLINQ • Measure again to validate the improvement • If no improvement, check performance tips 1-5
More Information • Parallel Computing Dev Center • http://msdn.com/concurrency • Code samples • http://code.msdn.microsoft.com/ParExtSamples • Team Blogs • Managed: http://blogs.msdn.com/pfxteam • Tools: http://blogs.msdn.com/visualizeparallel • Forums • http://social.msdn.microsoft.com/Forums/en-US/category/parallelcomputing • My blog • http://igoro.com/
YOUR FEEDBACK IS IMPORTANT TO US! Please fill out session evaluation forms online at MicrosoftPDC.com
Learn More On Channel 9 • Expand your PDC experience through Channel 9 • Explore videos, hands-on labs, sample code and demos through the new Channel 9 training courses channel9.msdn.com/learn Built by Developers for Developers….