Programming with Concurrency in .NET: Concepts, Patterns and Best Practices

Programming with Concurrency in .NET: Concepts, Patterns and Best Practices Joel PobarDEV318 Developer Consultant Microsoft Corporation

Agenda • Why the need for concurrency? • Concurrency programming today • Threads • Locks & Shared Memory • Memory Models • Future models

Why Concurrency? • For responsiveness • For performance • Why not concurrency? • It brings non-determinism • Specific knowledge and discipline needed • Tough “Heisenbugs”

Concurrency for Responsiveness • “All apps are network apps” • Use web services, network files, slow disks, … • Latency, variance, timeouts, partial failure • How are we doing? • User is locked out, can’t cancel, no repainting • Hangs as prevalent and disruptive as crashes

Let’s Get Responsive! • Decouple UI and work • Show internal states and partial results • Provide ‘cancel’ • Exploit pre- and post-computing • Define async states and behaviors • So each is a feature, not a bug • Apply same ideas to responsive libraries • Document thread safe synchronous usage • Watch out for thread issues in libraries you call

Responsive UI with BackgroundWorker

Thread Pool Thread PoolThread Threads, UI and the ThreadPool UI Message Queue(and Pumping) RunWorkerAsync DoWork ProgressChanged RunWorkerCompleted UI Thread

log transistors/die log CPU clock freq 5 B<10 GHz >30%/y ! 100 M 3 GHz <10%/y 10,000 1 MHz 2015 2003 1975 Why Concurrency?Performance: The Multi-Core era • Processors don’t get way faster anymore – You just get a lot more slow ones!!!

2006->2010 Microprocessor Trends • 90→65→45 nm lithography advances • Twice, twice again as many (faster) transistors • “Slower” wires + maxed out thermal envelope→ slower CPU frequency scaling • Same freq + more cores + more cache RAM→ ~same cache/core and compute/core • Architecture advances • (Hardware) multithreading • Optimizing for throughput, power • System-on-a-Chip integration: interconnects, shared caches, I/O and DRAM controllers • Proliferation of PC multiprocessor topologies

System On A Chip Example: • The “Killer” Network interface card:

Change and Opportunity • Windows Vista era: 1-4 cores, 1-8 hw threads? • Soon cheap servers with many cores • Great opportunities for new killer apps • Chip vendors can and will provide as many cores and threads as developers can harness If you come, they will build it

Concurrency for Performance • Many of today’s apps are not CPU bound • Greater use of concurrency for responsiveness enables increased CPU utilization • If CPU bound, consider parallelism • Tune your algorithms and datafirst! • Let perf goals and measurements guide you • Every ~2 years, twice as many cores... • (See also) • Improving .NET App. Perf. and Scalability:http://msdn.microsoft.com/library/en-us /dnpag/html/scalenet.asp • Rico Mariani’s blog: http://blogs.msdn.com/ricom

Agenda • Why the need for concurrency? • Concurrency programming today • Threads • Locks & Shared Memory • Memory Models • Future models

Some Concurrent Programming Models • Server per-client work-unit parallelism • Implicit data parallelism: SQL, Direct3D • Induced concurrency: UI events, CLR finalizers, web services • Loop parallelism: OpenMP • Message passing: MPI, Cω, CCR • Implicit parallelism in functional languages • Threads, shared memory, and locks

a lock 1 Thread1: Thread 2:... ...lock(a) { ... ... lock(a) { ... ... Thread 1 locksaThread 2blocks untila is unlocked 2 Threads, Shared Memory and Locks: Concepts (1 of 2) • “Simultaneous” multiple threads of control • Shared memory changes under you • Shared: global data, static fields, heap objects • Private: PC, registers, locals, stack, unshared heap objects, thread local storage • The three isolation techniques • Confinement: sharing less of your state • Immutability: sharing read-only state • Synchronization: using locks to serialize access to writeable shared state

1 Thread 1: Thread 2:lock(a) { ... ... lock(b){ ... ...lock(b) {......lock(a){ ... ... } ... ... } ... ...} ... } a b lock lock Deadlock! 2 Concepts (2 of 2) • Invariants: logical correctness properties • Safety vs. Liveness vs. Efficiency • Safety: program is “correct”: invariants hold • Race conditions → violated invariants→ hard bugs • Liveness: program makes progress: no deadlocks • Efficiency: as parallel as possible • Waiting (for another thread to do something) • Consumer awaits producer • Manager awaits workers

Concepts Recap: Single vs. Multi Threaded

Key Managed Threading API’s • Threading: creating concurrent activities • ThreadPool.QueueUserWorkItem(delegate) • new Thread(threadstart), Thread.Start(), Thread.Join() • Locking: synchronizing (serializing) your shared memory accesses • lock(obj) statement [SyncLock in VB] • Monitor.Enter(obj);try { statement; }finally { Monitor.Exit(obj); } • Waiting: scheduling work • Monitor.Wait(obj) • Monitor.Pulse(obj), Monitor.PulseAll(obj)

Lock over all writeable shared state And always use the same lock for the given state Locking Rules and Best Practices class MyList<T> { T[] items; int n; void Add(T item) { items[n] = item; n++; } … } class MyList<T> { T[] items; int n; void Add(T item) { lock (this) { items[n] = item; n++; } } … }

Lock over all writeable shared state And always use the same lock for the given state Lock over entire invariant Locking Rules and Best Practices class MyList<T> { T[] items; int n; void Add(T t) { lock(this) items[n] = t; lock(this) n++; } … } class MyList<T> { T[] items; int n; // invariant: n is count // of valid items in list // and items[n] == null void Add(T t) { lock(this) { items[n] = t; n++; } } … }

Lock over all writeable shared state And always use the same lock for the given state Lock over entire invariant Use private lock objects And don’t lock on Types or strings Locking Rules and Best Practices class MyList<T> { T[] items; int n; // invariant: n is count // of valid items in list // and items[n] == null void Add(T t) { lock(this) { items[n] = t; n++; } } … } class MyList<T> { T[] items; int n; // invariant: n is count // of valid items in list // and items[n] == null object lk = new object(); void Add(T t) { lock(lk) { items[n] = t; n++; } } … } class MyList<T> { T[] items; int n; // invariant: n is count // of valid items in list // and items[n] == null … static void ResetStats() { lock(typeof(MyList<T>)){ … } } … } class MyList<T> { T[] items; int n; // invariant: n is count // of valid items in list // and items[n] == null static object slk = new object(); … static void ResetStats() { lock(slk){ … } } … }

Lock over all writeable shared state And always use the same lock for the given state Lock over entire invariant Use private lock objects And don’t lock on Types or strings Don’t call others’ code while you hold locks Locking Rules and Best Practices class MyList<T> { T[] items; int n; // invariant: n is count // of valid items in list // and items[n] == null object lk = new object(); void Add(T t) { lock(lk) { items[n] = t; n++; Listener.Notify(this); } } … } class MyList<T> { T[] items; int n; // invariant: n is count // of valid items in list // and items[n] == null object lk = new object(); void Add(T t) { lock(lk) { items[n] = t; n++; } Listener.Notify(this); } … }

Lock over all writeable shared state And always use the same lock for the given state Lock over entire invariant Use private lock objects And don’t lock on Types or strings Don’t call others’ code while you hold locks Locking Rules and Best Practices Use appropriate lock granularities class MyService { static object lk_all = …; static void StaticDo() { lock(lk_all) { … } } void Do1() { lock(lk_all) { … } } void Do2() { lock(lk_all) { … } } } class MyService { static object lk_all = …; object lk_inst = …; static void StaticDo() { lock(lk_all) { … } } void Do1() { lock(lk_inst) { … } } void Do2() { lock(lk_inst) { … } } } class MyService { static object lk_all = …; object lk_inst = …; object lk_op2 = …; static void StaticDo() { lock(lk_all) { … } } void Do1() { lock(lk_inst) { … } } void Do2() { lock(lk_op2) { … } } }

Lock over all writeable shared state And always use the same lock for the given state Lock over entire invariant Use private lock objects And don’t lock on Types or strings Don’t call others’ code while you hold locks Locking Rules and Best Practices class MyService { A a; // lock: lkA B b; // lock: lkB // order! lock(a) < lock(b) … void DoAB() { lock(a) lock(b) { a.Do(); b.Do(); } } void DoBA() { lock(a) lock(b) { b.Do(); a.Do(); } } } class MyService { A a; B b; … void DoAB() { lock(a) lock(b) { a.Do(); b.Do(); } } void DoBA() { lock(b) lock(a) { b.Do(); a.Do(); } } } Use appropriate lock granularities Order locks to avoid deadlock

Lock over all writeable shared state And always use the same lock for the given state Lock over entire invariant Use private lock objects And don’t lock on Types or strings Don’t call others’ code while you hold locks Locking Rules and Best Practices Use appropriate lock granularities Order locks to avoid deadlock

Wait example Use only the one true wait pattern Test and retest condition while holding lock Complete example Correct (?) but could be more efficient Pulse vs. PulseAll Waiting Rules and Best Practices public class Channel<T> { T t; bool full; object mon = new object(); public T Get() { T t = default(T); lock (mon) { while (!full) Monitor.Wait(mon); t = this.t; full = false; Monitor.PulseAll(mon); } return t; } public void Put(T t) { lock (mon) { while (full) Monitor.Wait(mon); this.t = t; full = true; Monitor.PulseAll(mon); } } } public class Channel<T> { T t; bool full; object mon = new object(); public T Get() { … lock (mon) { while (!full) Monitor.Wait(mon); … public void Put(T t) { lock (mon) { … full = true; Monitor.PulseAll(mon); … while (!full) lock (mon) { Monitor.Wait(mon); … lock (mon) { if (!full) Monitor.Wait(mon); … lock (mon) { while (!full) Monitor.Wait(mon); …

WaitHandle,AutoResetEvent,ManualResetEvent,Mutex, Semaphore Async pattern, IO Maybe later Interlocked.* ReaderWriterLock Thread.Interrupt Async delegates Other Concurrency Facilities • Waiting on events is more flexible than Monitor.Wait • Prefer ThreadPool overexplicit thread mgmt • Prefer ThreadPool.QUWI over async delegates • Prefer polling for cancel over Thread.Interrupt

Concurrency for Performance

Memory ModelsConstructor Race Condition • Can inst refer to an uninitialized Foo? class Foo { static Foo inst; string state; bool initialized; private Foo() { state = “I’m happy”; initialized = true; } public Foo Instance { get { if (inst == null) lock (this) { if (inst == null) inst = new Foo(); } return inst; } } } // Two threads concurrently: Foo i = Foo.Instance; Might look something like this (psuedo-jitted code): Foo tmp = GCAlloc(typeof(Foo)); tmp->state = “I’m happy”; tmp->initialized = 1; inst = tmp; But what if it turned into this? inst = GCAlloc(typeof(Foo)); inst->initialized = 1; inst->state = “I’m happy”; • Thread 2 could see non-null inst, yet: • Initialized == 0, or • Initialized ==1, but state == null

Read/Write Reordering • Compilers (JIT) and processors want to execute read and/or writes out of order, e.g. // can become static int x, y; void Foo() { y = 2; // swap and delete one x = 1; // … } // source code static int x, y; void Foo() { y = 1; x = 1; y = 2; // … } • We say the write of x passed the 2nd write to y • Code motion: JIT optimizations • Out of order execution: CPU pipelining, predictive execution • Cache coherency: Hardware threads use several memories • Writes by one processor can move later (in time) due to buffering • Reads can occur earlier due to locality and cache lines • Not legal to impact sequential execution, but can be visible to concurrent programs • Memory models define which observed orderings are permissible

Memory ModelsControlling Reordering • Load acquire: Won’t move after future instrs • Store release: Other instrs won’t move after it • Fence: No instructions will “pass” the fence in either direction • A.k.a. barrier

Memory ModelsOn the CLR • Strongest model is sequential/program order • Seldom implemented (x86), we are a bit weaker • Reordering is for performance; limiting that limits the processor’s ability to effectively execute code • Locks acquisitions and releases are fences • Makes code using locking simple[r] • Lock free is costly and difficult – just avoid it! • Notice this didn’t solve the ctor race, however • ECMA specification • Volatile loads have acquire semantics • Volatile stores are release semantics • v2.0 implements a much stronger model • All stores have release semantics • Summary: 2.0 prevents the ctor race we saw earlier • But on strong model machines, won’t cause problems on 1.x

Memory ModelsWhy volatile and Thread.MB()? • Reorderings are still possible • Non-acquire loads can still pass each other • st.rel followed by a ld.acq can still swap • Volatile can fix #1, Thread.MB() can fix #2 • Q: For example, can a > b? • A: Yes, unless a (or b) are marked volatile static int a; static int b; // Thread #1 while (true) { int x = a; int y = b; Debug.Assert(y >= x); } // Thread #2 While (true) { b++; a++; }

Agenda • Why the need for concurrency? • Concurrency programming today • Threads • Shared Memory • Locks • Memory Models • Future models

Looking Way Ahead • Remember “old SW runs faster on new PCs and faster still on next year’s PCs”? • Scaling up server apps  – client apps? • Vision: what we did for memory management(CLR v1), we aim to do for concurrency: • Using our evolved languages, libs, tools, you’ll build & ship an app with lots of latent parallelism • On new HW, automatically apply extra HW resources to realize more latent parallelism • More tasks, more loops run in parallel • All without exposing latent races or deadlocks

Looking Way Ahead • Developer Tools • OpenMP (Visual C++ in Visual Studio 2005, MSDN) • Concur (see TLN309) • “Language Integrated Query” (see TLN306) • Microsoft Research (see FUN323) • RaceTrack: detecting inconsistent lock usage • Spec#: disciplined locking • Cω: synchronization using chords (message joins) • Software transactional memory • Parallelism in Haskell • Microsoft Incubation • Concurrency and Coordination Runtime

Looking Way AheadSoftware Transactional Memory • Shared memory + locks = extra complexity • Simple mistakes → data races, deadlocks • Concurrent-component composition issues • STM: atomic { a(); b(); c(); …} • You think “I’m the only code on the machine!” • STM observes your reads and writes, does concurrency control, abort/retry for you • Automatic error clean up (back out) too! • Status: ‘research’

Conclusions • Concurrency is increasingly important for responsiveness and parallelism • Multi-core era brings change and opportunity • Our platforms, languages, libraries, and tools will evolve to simplify concurrency • You now know key concepts and rules for programming with shared memory and locks • TRY IT!

Resources

© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Programming with Concurrency in .NET: Concepts, Patterns and Best Practices