170 likes | 287 Views
Multicore Chips and Parallel Programming. Mary Hall Dept. of Computer Science and Information Sciences Institute. The Multicore Paradigm Shift: Technology Drivers. Part 1: Technology Trends. What to do with all these transistors?. Key ideas:
E N D
Multicore Chips and Parallel Programming Mary Hall Dept. of Computer Science and Information Sciences Institute SSE Meeting
The Multicore Paradigm Shift:Technology Drivers SSE Meeting
Part 1: Technology Trends What to do with all these transistors? • Key ideas: • Movement away from increasingly complex processor design and faster clocks • Replicated functionality (i.e., parallel) is simpler to design • Resources more efficiently utilized • Huge power management advantages SSE Meeting
Supercomputer: IBM BG/L Commodity Server: Sun Niagara Embedded: Xilinx Virtex 4 The Architectural Continuum SSE Meeting
Multicore: Impact on Software Consequences: • Individual processors will no longer get faster. At first, they might get a little slower. • Today’s software may not perform as well on tomorrow’s hardware as written. • And forget about adding capability! The very future of the computing industry demands successful strategies for applications to exploit parallelism across cores! SSE Meeting
The Multicore Paradigm Shift:Computing Industry Perspective We are at the cusp of a transition to multicore, multithreaded architectures, and we still have not demonstrated the ease of programming the move will require… I have talked with a few people at Microsoft Research who say this is also at or near the top of their list [of critical CS research problems]. Justin Rattner, CTO, Intel Corporation SSE Meeting
The Rest of this Talk • Convergence of high-end, conventional and embedded computing • Application development and compilation strategies for high-end (supercomputers) are now becoming important for the masses • Why? • Technology trends (Motivation) • Looking to the future • Automatically generating parallel code is useful, but insufficient. • Parallel computing for the masses demands better parallel programming paradigms. • Compiler technology will become increasingly important to deal with a diversity of optimization challenge… and must be engineered for managing complexity and adapting to new architectures. • Potential to exploit vast machine resources to automatically compose applications and systematically tune application performance. • New tunable library and component technology. SSE Meeting
1. Automatic Parallelization From Hall et al., “Maximizing Multiprocessor Performance with the SUIF Compiler”, IEEE Computer, Dec. 1996. • Old approaches: • Limited to loops and array computations • Difficult to find sufficient granularity (parallel work between synchronization) • Success from fragile, complex software • New ideas in this area: • Finer granularity of parallelism -- more plentiful • Combine with hardware support (e.g., speculation and multithreading) SSE Meeting
2. Parallel Programming State of the Art Three dominant classes of applications Domain-specific, intellectually challenging and low-level programming models not suitable for the masses. SSE Meeting
2. New Parallel Programming Paradigms • Transactional memory • Section of code executes atomically with subsequent commit or rollback • Programming model + hardware support • Streams and data-parallel models • Data streams describe the flow of data • Well-suited for certain applications and hardware (IBM Cell, GPUs) • Domain-specific languages and libraries • Parallelism implicit within implementation Different applications and users demand different solutions. Convergence unlikely. Architecture independence? SSE Meeting
3. Engineering a Compiler • Compiler research will play a crucial role in achieving performance and programmability of multi-core hardware. • What is the state of compilers today? • Roughly 5 year lag between introducing a new architecture and a robust compiler • Many interesting new architectures fail in the marketplace due to inadequate software tools • Today’s compilers are complex and monolithic • SUIF has ~500K LOC, Open64 has ~12M LOC The best research ideas do not always make it into practice SSE Meeting
Batch Compiler code input data 3. A New Kind of “Compiler” Traditional view: SSE Meeting
3 & 4. Performance Tuning “Compiler” transformation script(s) code Experiments Engine Code Translation input data (characteristics) search script(s) SSE Meeting
4. Auto-tuner Experiments Engine transformation script(s) code Code Translation input data (characteristics) search script(s) SSE Meeting
Managing data movement and synchronization Heterogeneous: Additional Complexity • Other: • Utilizing highly tuned libraries • Differences in programming models (GPP +FPGA is extreme example) Staging Data to/from global memory Memory Device Type 1 Device Type 2 Device Type 3 Device Type 4 Partitioning: Where to execute? SSE Meeting
5. Libraries and Component Technology Interface: Provides/ Requires Code (source or binary) Data Description: Types, Sizes Expanded View Traditional View Performance: Device, Data Features Interface: Abstract Provides/ Requires Partial Code (source or tunable binary) Code Generator Interface: Device Dependencies Data Description: Types, Sizes Data Description: Map Features to Optimization Support for automatic selection, tuning, scheduling, etc. SSE Meeting
Summary • Parallel computing is everywhere! • And we need software tools • Can we find some common ground? • Strategies • Automatic parallelization • Libraries and domain-specific tools that hide parallelism component technology • New programming languages • Auto-tuners to “test” alternative solutions • General approach to solving challenges • Education: CS503, Parallel Programming • Organize the community to support incremental LONG TERM development. SSE Meeting