1 / 40

“Going Parallel with C++11” Supercomputing 2012

“Going Parallel with C++11” Supercomputing 2012. Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu http://www.joehummel.net/downloads.html. C++ 11. New standard of C++ has been ratified “C++0x” ==> “C++11” Lots of new features Workshop will focus on concurrency features.

allene
Download Presentation

“Going Parallel with C++11” Supercomputing 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Going Parallel with C++11”Supercomputing 2012 Joe Hummel, PhD UC-Irvine hummelj@ics.uci.edu http://www.joehummel.net/downloads.html

  2. C++ 11 • New standard of C++ has been ratified • “C++0x” ==> “C++11” • Lots of new features • Workshop will focus on concurrency features Going Parallel with C++11

  3. Why are we here? • Asyncprogramming: • Better responsiveness… • GUIs (desktop, web, mobile) • Cloud • Windows 8 • Parallel programming: • Better performance… • Financials • Scientific • Big data C C C C C C C C Going Parallel with C++11

  4. Hello World --- of threading #include <thread> #include <iostream> void func() { std::cout << "**Inside thread" << std::this_thread::get_id() << "!" << std::endl; } intmain() { std::thread t; t = std::thread( func); t.join(); return 0; } A simple function for threadto do… Create and schedule thread… Wait for thread to finish… Going Parallel with C++11

  5. Demo #1 • Hello world… Going Parallel with C++11

  6. Avoiding early program termination… #include <thread> #include <iostream> void func() { std::cout << "**Hello world...\n"; } intmain() { std::thread t; t = std::thread( func); t.join(); return 0; } (1) Thread function must do exception handling; unhandled exceptions ==> termination… void func() { try { // computation: } catch(...) { // do something: } } (2) Must join, otherwise termination… (avoid use of detach( ), difficult to use safely) Going Parallel with C++11

  7. Pick your style… • Old school: • distinct thread functions (what we just saw) • New school: • lambda expressions (aka anonymous functions) Going Parallel with C++11

  8. Demo #2 • A thread that loops until we tell it to stop… When user presses ENTER, we’ll tell thread to stop… Going Parallel with C++11

  9. (1) via thread function void loopUntil(bool*stop) { auto duration = chrono::seconds(2); while(!(*stop)) { cout<< "**Inside thread...\n"; this_thread::sleep_for(duration); } } #include <thread> #include <iostream> #include <string> using namespace std; . . . intmain() { boolstop(false); thread t(loopUntil, &stop); getchar(); // wait for user to press enter: stop= true; // stop thread: t.join(); return 0; } Going Parallel with C++11

  10. (2) via lambda expression Closure semantics: [ ]: none, [&]: by ref, [=]: by val, … intmain() { boolstop(false); thread t = thread([&]() { auto duration = chrono::seconds(2); while (!stop) { cout<< "**Inside thread...\n"; this_thread::sleep_for(duration); } } ); getchar(); // wait for user to press enter: stop = true; // stop thread: t.join(); return 0; } lambda arguments • thread t([&] () • { • auto duration = chrono::seconds(2); • while (!stop) • { • cout<< "**Inside thread...\n"; • this_thread::sleep_for(duration); • } • } • ); lambda expression

  11. Trade-offs • Lambdas: • Easier and more readable -- code remains inline • Potentially more dangerous ([&] captures everything by ref) • Functions: • More efficient -- lambdas involve class, function objects • Potentially safer -- requires explicit variable scoping • More cumbersomeand illegible Going Parallel with C++11

  12. Demo #3 • Multiple threads looping… When user presses ENTER, all threads stop… Going Parallel with C++11

  13. vector<thread> workers; for(inti = 1; i <= 3; i++) { workers.push_back( thread([i, &stop]() { while(!stop) { cout<< "**Inside thread " << i << "...\n"; this_thread::sleep_for(chrono::seconds(i)); } }) ); } Solution #include <thread> #include <iostream> #include <string> #include <vector> #include <algorithm> using namespace std; intmain() { cout<< "** Main Starting **\n\n"; boolstop = false; . . . getchar(); cout << "** Main Done **\n\n"; . . . return 0; } stop = true; // stop threads: // wait for threads to complete: for ( thread& t : workers) t.join(); Going Parallel with C++11

  14. A Real Example • Matrix multiply… Going Parallel with C++11

  15. Multi-threaded solution // 1 thread per core: numthreads= thread::hardware_concurrency(); introws = N / numthreads; intextra= N % numthreads; intstart= 0; // each thread does [start..end) intend = rows; vector<thread> workers; for(int t = 1; t <= numthreads; t++) { if (t == numthreads) // last thread does extra rows: end += extra; workers.push_back( thread([start, end, N, &C, &A, &B]() { for (inti = start; i < end; i++) for (int j = 0; j < N; j++) { C[i][j] = 0.0; for (int k = 0; k < N; k++) C[i][j] += (A[i][k] * B[k][j]); } })); start = end; end = start + rows; } for (thread& t : workers) t.join();

  16. High-Performance Computing • Parallelism alone is not enough… HPC == Parallelism + Memory Hierarchy─Contention Expose parallelism • Minimize interaction: • false sharing • locking • synchronization • Maximize data locality: • network • disk • RAM • cache • core Going Parallel with C++11

  17. Cache-friendly matrix multiply X Going Parallel with C++11

  18. Cache-friendly solution • Loop interchangeis first step… workers.push_back( thread([start, end, N, &C, &A, &B]() { for (inti = start; i < end; i++) for (int j = 0; j < N; j++) C[i][j] = 0.0; for (inti = start; i < end; i++) for (intk = 0; k < N; k++) for (intj = 0; j < N; j++) C[i][j] += (A[i][k] * B[k][j]); })); Next step is to block multiply… Going Parallel with C++11

  19. C++11 Features and Status Going Parallel with C++11

  20. Compilers… • No compiler as yet fully implements C++11 • Visual C++ 2012 has best concurrency support • Part of Visual Studio 2012 • gcc 4.7 has best overall support • http://gcc.gnu.org/projects/cxx0x.html • clang 3.1 appears very good as well • I did not test • http://clang.llvm.org/cxx_status.html Going Parallel with C++11

  21. Compiling with gcc # makefile # threading library: one of these should work # tlib=thread tlib=pthread # gcc 4.6: ver=c++0x # gcc 4.7: # ver=c++11 build: g++ -std=$(ver) -Wall main.cpp -l$(tlib) Going Parallel with C++11

  22. Executive Summary Going Parallel with C++11

  23. Locking • Use mutexto protect against concurrent access… • #include <mutex> • mutexm; • intsum; thread t1([&]() { m.lock(); sum += compute(); m.unlock(); } ); thread t2([&]() { m.lock(); sum += compute(); m.unlock(); } ); Going Parallel with C++11

  24. RAII • “Resource Acquisition Is Initialization” • Advocated by B. Stroustrup for resource management • Uses constructor & destructor to properly manage resources (files, threads, locks, …) in presence of exceptions, etc. Locks m in constructor thread t([&]() { lock_guard<mutex> lg(m); sum += compute(); } ); thread t([&]() { m.lock(); sum += compute(); m.unlock(); }); Unlocks m in destructor should be written as… Going Parallel with C++11

  25. Atomics • Use atomicto protect shared variables… • Lighter-weight than locking, but much more limited in applicability • #include <atomic> • atomic<int> count; • count = 0; thread t1([&]() { count++; }); thread t2([&]() { count++; }); X thread t3([&]() { count = count + 1; }); not safe… Going Parallel with C++11

  26. Lock-free programming • Atomics enable safe, lock-free programming • “Safe” is a relative word… • int x; • atomic<bool> initd; • initd = false; • int x; • atomic<bool> done; • done = false; thread t1([&]() { if (!initd) { lock_guard<mutex> _(m); x = 42; initd = true; } << consume x, … >> }); thread t1([&]() { x = 42; done = true; }); lazy init done flag thread t2([&]() { if (!initd) { lock_guard<mutex> _(m); x = 42; initd = true; } << consume x, … >> }); thread t2([&]() { while (!done) ; assert(x==42); });

  27. Demo • Prime numbers… Going Parallel with C++11

  28. Futures • Futures provide a higher-level of abstraction • Starts an asynchronous operation on some thread, await result… return type… #include <future> . . future<int>fut = async( []() -> int { int result = PerformLongRunningOperation(); return result; } ); . . try { int x = fut.get(); // join, harvest result: cout << x << endl; } catch(exception &e) { cout << "**Exception: " << e.what() << endl; } Going Parallel with C++11

  29. Execution of futures • May run on the current thread • May run on a new thread • Often better to let system decide… // run on current thread when someone asks for value (“lazy”): future<T> fut1 = async( launch::sync, []() -> ... ); future<T> fut2= async( launch::deferred, []() -> ... ); // run on a new thread: future<T> fut3= async( launch::async, []() -> ... ); // let system decide: future<T> fut4= async( launch::any, []() -> ... ); future<T> fut5= async( []() ... ); Going Parallel with C++11

  30. Demo Computes average review for a movie… • Netflix data-mining… Netflix Movie Reviews (.txt) Netflix Data Mining App Going Parallel with C++11

  31. Memory Model • C++ committee thought long and hard on memory model semantics… • “You Don’t Know Jack About Shared Variables or Memory Models”, Boehm and Adve, CACM, Feb 2012 • Conclusion: • No suitable definition in presence of race conditions • Solution: • Predictable memory model *only* in data-race-free codes • Computer may “catch fire” in presence of data races Going Parallel with C++11

  32. Example… • intx, y, r1, r2; • x = y = r1 = r2 = 0; thread t1([&]() { x = 1; r1 = y; } ); t1.join(); thread t2([&]() { y = 1; r2 = x; } ); t2.join(); What can we say about r1 and r2? Going Parallel with C++11

  33. Dekker’s example… If we think in terms of all possible thread interleavings(aka “sequential consistency”), then we know r1 = 1, r2 = 1, or both In C++ 11? Not only are the values of x, y, r1 and r2 undefined, but the program may crash! Going Parallel with C++11

  34. C++ 11 Memory Model • Def: two memory accesses conflict if they • access the same scalar object or contiguous sequence of bit fields, and • at least one access is a store. • A program is data-race-free(DRF) if no sequentially-consistent execution results in a data race. Avoid anything else. • Def: two memory accesses participate in a data race if they • conflict, and • can occur simultaneously. via independent threads, locks, atomics, … Going Parallel with C++11

  35. Beyond Threads Going Parallel with C++11

  36. Tasks vs. Threads • Tasks are a higher-level abstraction • Idea: • developers identify work • run-time system deals with execution details Task: a unit of work; an object denoting an ongoing operation or computation. Going Parallel with C++11

  37. Example • Microsoft PPL: Parallel Patterns Library #include <ppl.h> for (inti = 0; i < N; i++) for (int j = 0; j < N; j++) C[i][j] = 0.0; // for(inti= 0; i< N; i++) Concurrency::parallel_for(0, N, [&](inti) { for (intk = 0; k < N; k++) for (intj = 0; j < N; j++) C[i][j] += (A[i][k] * B[k][j]); }); Matrix Multiply Going Parallel with C++11

  38. Execution Model parallel_for( ... ); task task task task Windows Process C C C C C C Parallel Patterns Library Thread Pool C C Task Scheduler worker thread worker thread worker thread worker thread Resource Manager Windows

  39. That’s it! Going Parallel with C++11

  40. Thank you for attending • Presenter: Joe Hummel • Email: hummelj@ics.uci.edu • Materials:http://www.joehummel.net/downloads.html • References: • Book: “C++ Concurrency in Action”, by Anthony Williams • Talks: Bjarne and friends at MSFT’s “Going Native 2012” • http://channel9.msdn.com/Events/GoingNative/GoingNative-2012 • Tutorials: really nice series by BartoszMilewski • http://bartoszmilewski.com/2011/08/29/c11-concurrency-tutorial/ • FAQ: BjarneStroustrup’s extensive FAQ • http://www.stroustrup.com/C++11FAQ.html Going Parallel with C++11

More Related