190 likes | 336 Views
Implicitly Parallel Programming Models. For Thousand-Core Microprocessors Hwu, Ryoo, Ueng, Kelm, Gelado, Stone, Kidd, Barghsorkhi, Mahesri, Tsao, Navarro, Lumetta, Frank, Patel University of Illinois, Urbana-Champaign Universtat Politecnica de Catalunya. The Next Software Challenge.
E N D
Implicitly Parallel Programming Models For Thousand-Core Microprocessors Hwu, Ryoo, Ueng, Kelm, Gelado,Stone, Kidd, Barghsorkhi, Mahesri,Tsao, Navarro, Lumetta, Frank, Patel University of Illinois, Urbana-Champaign Universtat Politecnica de Catalunya
The Next Software Challenge • Today, TLP models potentially make more effective use of area and power than large ILP CPU’s • Multi-core, Many-core, Cell BE™, GPGPU, PPU, etc. • Scaling from 4-core to 1000-core chips could happen in the next 15 years • All semiconductor market domains converging to concurrent system platforms • PCs, game consoles, mobile handsets, servers, supercomputers, networking, etc. We need to make these systems effectively execute valuable, demanding apps.
One Plausible View • One might be able to stretch sequential programming models to small scale multi-core systems • As for large scale, many-core systems, explicitly parallel programming models must be used I would like to convince you that it is the other way around.
Implicitly Parallel Programming Models • Programmers choose or create algorithms that have desired amount of parallelism • The parallel algorithms are expressed in traditional sequential programming languages • With assertions of design properties and assumptions • The tools, compilers, runtime, and HW jointly reconstruct parallelism and arrange for parallel execution of the program
One Important Disclaimer Implicitly Parallel Programming Model ≠ Legacy sequential code • Code change needed whether we use implicitly or explicitly parallel programming models • Legacy code was not developed to give tools deep knowledge about the computation being performed. • E.g., source code does not express high-level properties and assumptions
Another Important Disclaimer Implicitly Parallel Programming Model ≠ Sequential Algorithms • When there is no parallelism in the algorithm,all is lost • The question is how to go from parallel algorithmsto parallel machine code
Human vs. Machine-level Programming Models Pentium Itanium NVida GPU
Cost of Tuning ExplicitlyParallel Programming • Developing an explicitly parallel app is an expensive proposition • Understanding and performing all optimizations needed is hard; transactional memory might help • Optimized parallelization will likely involve finessing libraries and infrastructure software(lack of composability) • The larger the number of cores, the less intuitive the optimization process will be
Cost of Explicitly Parallel Program Verification and Support • Verification and support of a hand parallelized program is even more expensive • Application-level testing needs to cover a much larger state space • Should contain the verification complexity through guarantees at the interface level • Reproducing bugs can be very challenging, deterministic execution semantics help Some of the points in "Software and the Concurrency Revolution." Sutter and Larus, ACM Queue 3(7), Sep. 2005.
Cost of Scaling Explicitly Parallel Programs • Programs developed in explicitly parallel form are unlikely to scale withMoore’s Law • Performance comes from optimizations that craft the computation and data accesses to best get around limitations • Every new generation or new platform will likely require app developers to redo their application
Implicitly Parallel Programming Flow Stylized C/C++ w/ assertions Deep analysis w/ feedback assistance Human Concurrency discovery For increased composability Visualizable concurrent form Systematic search for best/correct code gen Code-gen space exploration For increased scalability Visualizable sequential assembly code with parallel annotations parallel execution w/ sequential semantics Parallel HW w/sequential state gen For increased supportability Debugger
Application Scaling • Traditional GP compilers cover legacy apps (pit) based on sequential algorithms • Explicitly parallel programming models, OpenMP, and MPI cover simpler, earlier parallel apps (peach skin) • Implicitly parallel programming models extend the coverage into more sophisticated parallel apps (peach flesh) • “application scaling”
Different algorithms may expose different levels of parallelism while achieving desired result In motion estimation, can use previous vectors (either from space or time) as guess vectors Parallelism in Algorithms(H.263 motion estimation example)
Concurrency Discovery • Reconstruction of concurrency in a parallel algorithm expressed in a traditional sequential language • Program analysis techniques have advanced sufficiently in the past decade to do this job • Programmers need to avoid crypticprogramming styles • Programmer annotations on high-level design properties and assumptions helpful • This needs to be done when composing an explicitly parallel program from third-party components anyway
MPEG-4 H.263 EncoderParallelism Rediscovery (b) (c) (d) (e) (a)
0 1 2 3 time 1 2 3 4 1 2 3 4 1 2 3 4 Operations Performed On 16 x 16 Macroblocks Motion Estimation Motion Compensation , ( a ) Loop Partitioning Frame Subtraction DCT & Quantization 1 1 2 2 Dequantization , IDCT , Frame Addition 3 3 4 4 Main Memory Access ( b ) Loop Fusion + Memory Privatization Code Gen Space Exploration
Conclusion • Implicitly Parallel Programming Models • Increase code re-use through parallelism rediscovery • Reduce tedious programming chores through code-gen space exploration • Control software support cost through sequential verification and debugging interfaces • Use of appropriate application algorithms, coding style and compiler analysis techniques is the key to success