300 likes | 407 Views
Cilk++ , Cilk , Cilkscreen, and Cilk Arts are trademarks of Cilk Arts, Inc. Executive Briefing: Multicore-Enabling SaaS Applications. September 3, 2008. www.cilk.com. Agenda. Emergence of multicore processors Key challenges facing developers When can multicore help?
E N D
Cilk++, Cilk, Cilkscreen, and Cilk Artsare trademarks of Cilk Arts, Inc. Executive Briefing:Multicore-Enabling SaaSApplications September 3, 2008 www.cilk.com
Agenda • Emergence of multicore processors • Key challenges facing developers • When can multicore help? • Data races: a new type of bug • Questions to ask when going multicore • Programming tools & techniques
About CILKARTS Mission: To provide the easiest, quickest, and most reliable way to optimize application performance on multicore processors. • Launched in March 2007. • Headquartered in Burlington, MA. • Funded by Stata Venture Partners, software industry executives, founders, and grants from the NSF and DARPA. • First product is Cilk++, based on 15 years of research at MIT
Moore’s Law Transistor count is still rising, … Intel CPU Introductions but clock speed is bounded at ~5GHz. Source: Herb Sutter, “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software,” Dr. Dobb's Journal, 30(3), March 2005.
Power Density Source: Patrick Gelsinger, Intel Developer’s Forum, Intel Corporation, 2004.
Vendor Solution Intel 45nm quad-core processor • To scale performance, put many processor cores on a chip. • Intel predicts 80+ cores by 2011!
SaaS Opportunity • Increase throughput • Quantitative finance: increase volume of portfolios analyzed overnight • Reduce response time • Engineering simulation: accelerate structural analysis of assembly • Improve user experience • Multiplayer games: increased galaxy size • Reduce data center power consumption
User Work User Work Computer Operation 2 Computer Operation 1 Multicore and SaaS • Application response time? • Processor utilization? P1 P2 P3 P4 P5 P6 P7 P8
User Work User Work User Work User Work User Work User Work User Work User Work Computer Operation 2 Computer Operation 2 Computer Operation 2 Computer Operation 2 Computer Operation 1 Computer Operation 1 Computer Operation 1 Computer Operation 1 Multicore and SaaS • For CPU-constrained applications, multi-threading improves response time and boosts utilization Computer Operation #1 Computer Operation #2 P1 User Work User Work P2 P3 P4 P5 P6 P7 P8
Multicore Challenges Application Performance • How can you minimize response time? • Will your solution scale as the number of processor cores increases? • Can you identify performance bottlenecks? Development Time • How will you get your product out in time? • Where will you find enough parallel-programming talent? • Will you be forced to redesign your application? Software Reliability • Can you debug your parallel application? • How will you test it effectively before release?
Work & Span • Work: total amount of time spent in all the instructions • Span: Critical path • Parallelism: ratio of work to span 1 2
Work & Span • Work: total amount of time spent in all the instructions • Span: Critical path • Parallelism: ratio of work to span • In this example: • Work = 18 • Span = 9 • Parallelism = 2 • i.e., little gain beyond 2 processors 1 2 3 4 6 13 7 9 14 16 5 8 10 17 11 15 12 18
Can Multicore Help? • The more parallelism is available in an application, the more a multicore processor can help. Work:T1 = 58 Span: T∞ = 9 (same as previous example) Parallelism: T1/T∞ = 6.44
Race Bugs Definition.A determinacy race occurs when two logically parallel instructions access the same memory location and at least one of the instructions performs a write. A int x = 0; x++; x++; B C 1 x = 0; assert(x == 2); 2 4 r1 = x; r2 = x; D 3 5 r1++; r2++; 7 6 x = r1; x = r2; 8 assert(x == 2);
Coping with Race Bugs • Although locking can “solve” race bugs, lock contention can destroy all parallelism. • Making local copies of the nonlocal variables can remove contention, but at the cost of restructuring program logic. • Cilk++ provideshyperobjects to mitigate data races on nonlocal variables without the need for locks or code restructuring. IDEA:Different parallel branches may see differentviewsof the hyperobject.
20 Questions to Ask http://www.cilk.com/resource-library/going-multicore-20-questions-to-ask/
Development Time • To multicore-enable my application, how much logical restructuring of my application must I do? • Can I easily train programmers to use the multicore software platform? • Can I maintain just one code base, or must I maintain a serial and parallel versions? • Can I avoid rewriting my application every time a new processor generation increases the core count? • Can I easily multicore-enable ill-structured and irregular code, or is the multicore software platform limited to data-parallel applications? • Does the multicore software platform properly support modern programming paradigms, such as objects, templates, and exceptions? • What does it take to handle global variables in my application?
Application Performance • How can I tell if my application exhibits enough parallelism to exploit multiple processors? • Does the multicore software platform address response-time bottlenecks, or just offer more throughput? • Does application performance scale up linearly as cores are added, or does it quickly reach diminishing returns? • Is my multicore-enabled code just as fast as my original serial code when run on a single processor? • Does the multicore software platform's scheduler load-balance irregular applications efficiently to achieve full utilization? • Will my application "play nicely" with other jobs on the system, or do multiple jobs cause thrashing of resources? • What tools are available for detecting multicore performance bottlenecks?
Software Reliability • How much harder is it to debug my multicore-enabled application than to debug my original application? • Can I use my standard, familiar debugging tools? • Are there effective debugging tools to identify and localize parallel-programming errors, such as data-race bugs? • Must I use a parallel debugger even if I make an ordinary serial programming error? • What changes must I make to my release-engineering processes to ensure that my delivered software is reliable? • Can I use my existing unit tests and regression tests?
Parallel C++ Options Pthreads & WinAPI threads • An API for creating and manipulating O/S threads. • Programmer writes thread-interaction protocols. Intel’s Threading Building Blocks • A C++ template library with automatic scheduling of tasks. • Programmer writes explicit “continuations.” OpenMP • Open-source language extensions to C++. • Programmer inserts pragmas into code. Cilk++ • Faithful extension of C++. • Programmer inserts keywords into code that do not destroy serial semantics. • Provably good scheduler and a race-detection tool.
Cilk++ Cilk++is a remarkably simpleset of extensions for C++ and a powerful runtime systemfor multicore applications. Cilk++provides a smoothevolutionfrom serial programming to parallel programming.
CILK ARTS Solution Application Performance • Best-in-class performance • Linear scaling as cores are added • Minimal overhead on a single-core Development Time • Minimal application changes • Can be learned in days by programmers without multithreading expertise • Seamless path forward (and backward) Software Reliability • Multithreaded version as reliable as the original • No fundamental change to release engineering
Cilk++Compiler Conventional Compiler CILK ARTS Solution 1 int fib (int n) { if (n<2) return (n); else { int x,y; x = cilk_spawn fib(n-1); y = fib(n-2); cilk_sync; return (x+y); } } 2 Cilk++Hyperobject Library 5 Cilk++source Linker int fib (int n) { if (n<2) return (n); else { int x,y; x = fib(n-1); y = fib(n-2); return (x+y); } } 4 Cilk++Race Detector Binary Serial code Cilk++Runtime System 3 Parallel Regression Tests Conventional Regression Tests Reliable Single-Threaded Code Reliable Multi-Threaded Code Exceptional Performance
Thank You! • Free e-Book www.cilk.com/multicore-e-book/ • We are currently accepting applications for our Early Visibility program • For more info about Cilk++ and resources for multicoders: • duncan@cilk.com • www.cilk.com