380 likes | 510 Views
By Jon Nosacek. Multi-core Software Development with examples in C++. Why should you care?. Multi-core systems are becoming the standard for all devices Less heat 1 core = 2 cores at half frequency using ¼ power! (P = C × V 2 × F)
E N D
By Jon Nosacek Multi-core Software Development with examples in C++
Why should you care? • Multi-core systems are becoming the standard for all devices • Less heat • 1 core = 2 cores at half frequency using ¼ power! • (P = C × V2 × F) • Designing a new system around multi-core architecture can be quite difficult.
Why should you care? (cont) • Technology isn’t evolving like it was before • Not automatic gains • We want fast! • Our users deserve the same
Multi-threaded VS Multi-core • Same basic principle, but can yield very different results • Multi-threaded assumes no knowledge of the release environment and can make the program slower on a single-core platform • Multi-core means specifically designing your system for a platform that you know has two or more cores. Can yield significant performance boosts if done correctly
Hardware • To understand how the software works, you must first understand how the hardware works • Very much a hardware-oriented evolution (Hardware could not keep up with our increasing demands)
Why transition to multi-core? • Higher processor frequencies necessitated better cooling • There is a limit based on materials and methods • Computers are replacing us • Brain is not sequential
Why multi-core (cont) • Traditional: • Multi-core
Intel Core 2 extreme Quad http://www.techspot.com/articles-info/23/images/img2.jpg Intel Core i7 965 quad core (8 threads) http://tinyurl.com/3tgfygn
Terminology • Thread • Smallest unit of execution that a program can be broken down into • Contains all the info that is needed for it to run • Atomic Statement • Single operation by the processor. Can’t slice out during execution
Terminology (cont) • Hyper threading: (SMT) • Intel’s route of having 2 threads per core to simulate more cores and reduce CPU waste • Virtual processors not necessarily tied to physical ones • Example of hardware helping software
How to design a multi-core system • Planning • Implementation • Testing • Deployment • Maintenance
Planning • A “code-and-fix” laissez faire mentality WILL NOT WORK • Too many things to go wrong, hard to pinpoint problem post factum • Single most important step • Problems here will cascade into other steps and become worse • Clear vision is a must • How deep into threading do you want to go?
Planning (cont.) • Opportunity comes during the decomposition phase • Need to model • the state of the threads and what combinations effect each other • Thread interaction • Number of threads • More threads => more problems • Balance performance with understandability, maintainability, time • Fairness and priority • More threads => more communication
Planning (cont.) • Error handling is more important • Who handles the errors? Other threads might take a while to respond and what if everyone responds? • Synchronization and semaphores should be used sparingly. • Threads should be as independent as possible • Need to make rules on memory access • Dataflow diagrams!
Concurrent Vs Parallel Design • Which do you think is better? http://blog.rednael.com/content/binary/parallel%20vs%20concurrent.jpg
Concurrent Parallel • Easy to design and implement • Works well for IO • Minimal interaction to plan and synchronize • Less CPU waste • Even more difficult to track • CPU has to keep track and time slice more (swap time)
Implementation • Languages are becoming more and more open to multi-core programming • There are libraries for C++ that help ease the workload • A lot of threading is OS tied and Microsoft knows theirs better than anyone • Usually support goes Linux & Microsoft then Macs • Watch for CPU specific commands that can improve performance
Implementation (cont.) • Make sure resources are being managed • Update the models as the system changes • The IDE you choose during this phase can be very important and effects what you see your system doing • Using existing libraries usually reduces workload and are often more efficient • Make sure all basic/shared initializations are done before the threads are created
Implementation (cont.) • Watch for evolving trends • If a lot of communication is going on between two threads, see if things can be merged/swapped • See which threads take up the most resources and what will increase program responsiveness • Keep the future in mind • More cores will always be added. • Think about the simplest case and expand into the complex • Also realize that more features are being added to C++ to help abstract multithreading
// Basic example: #include < iostream > #include < pthread.h > void *task1(void *X) //define task to be executed by ThreadA { cout < < “Thread A complete” < < endl; return (NULL); } void *task2(void *X) //define task to be executed by ThreadB { cout < < “Thread B complete” < < endl; return (NULL); } int main(int argc, char *argv[]) { pthread_tThreadA,ThreadB; // declare threads pthread_create( & ThreadA,NULL,task1,NULL); // create threads pthread_create( & ThreadB,NULL,task2,NULL); pthread_join(ThreadA,NULL); // wait for threads to “join up” pthread_join(ThreadB,NULL); return (0); }
// Doing little things can make a big difference too: array<int, 4> a = { 24, 26, 41, 42 }; vector<tuple<int,int>> results1; concurrent_vector<tuple<int,int>> results2; elapsed = time_call([&] { for_each (a.begin(), a.end(), [&](int n) { results1.push_back(make_tuple(n, fibonacci(n))); }); }); elapsed = time_call([&] { parallel_for_each (a.begin(), a.end(), [&](int n) { results2.push_back(make_tuple(n, fibonacci(n))); });}); // a 4 core system outputs: 9250 ms, 5726 ms
Testing • Race conditions are the most prevalent • Identify critical paths • Balance threads and tweak for performance • Non-determinism (for some initial state, the final state is ambiguously determined)
Deployment • Mostly the same • See what platforms are actually using you program and tune as necessary
Maintenance • Need to keep up with the changing tech (still pretty new) • Adding new functionality will be more difficult especially when it’s very different from existing. • Much more testing needed • Going back to the original plan and seeing how new features fit in and what is effected is much more important
Maintenance (cont.) • What about adding to an existing system? • Very difficult • Should focus on largest time consumers (IO, disk, complex algorithms) • Applications with low coupling are the best to add parallel aspects
Challenges • Lots of planning needed • Thorough understanding of the environment • Very hard to debug • Built in support is hit-and-miss (language & IDE) • Security concerns (from other programs as well as your own) • A lot of life-critical embedded systems are sticking with single core platforms
What apps can help me out? • Intel’s Threading Building Blocks • OpenMP • Microsoft Visual Studio • MULTI-Green Hills • Total View - Rogue Wave
Intel’s Threading Building Blocks • Template Library • Algorithms, containers, mutex, atomic statements, timing, scheduling • Implements “Task Stealing” • If one core is idle, it will take a scheduled task from another to reduce CPU waste • Automatically creates the threads for you to maximize performance • Much like parallel_for • Tries to be like the STL • ease of use, generality, but more aggressive
Intel’s Threading Building Blocks (cont.) • A bit more memory/cache oriented than STL • Intel knows their own cores and how to schedule on them • Adds a lot more concurrency-oriented data types (concurrent_queue, concurrent_vector, concurrent_hash_map) • Also geared for easy scalability • More atomic operations (also from knowing their own cores) • Follows a pipe-line architecture like graphics
OpenMP int th_id, nthreads; #pragma omp parallel private(th_id) shared(nthreads) { th_id = omp_get_thread_num(); #pragma omp critical { cout << "Hello World from thread " << th_id << '\n'; } #pragma omp barrier #pragma omp master { nthreads = omp_get_num_threads(); cout << "There are " << nthreads << " threads" << '\n'; } }
Microsoft Visual Studio • Thread View
MULTI IDE – Green Hills • Cool debugging/recording features http://www.ghs.com/products/MULTI_IDE.html
Total View - Rogue Wave • Thread viewer:
Sources: • Buttari, Alfredo, Jack Dongarra, Jakub Kurzak et all. The Impact of Multicore on Math Software • Hughes, Cameron, and Tracey Hughes. Professional Multicore Programming Design and Implementation for C++ Developers. Indianapolis, IN: Wiley Pub., 2008. • http://msdn.microsoft.com/en-us/concurrency/default.aspx • http://channel9.msdn.com/search?term=concurrency • http://www.cs.kent.edu/~farrell/amc09/lectures/
Any Questions? • This is all sounds like a lot of work. Why should we bother when something easier might come along? • It’s very much a game of figuring out how much effort gets the largest returns. • True progress will take both EE’s and SE’s (and CS’s too if any showed up today) • Might be a long time before we see change