160 likes | 175 Views
Programming Languages/Models and Compiler Technologies. Moderator: John Mellor-Crummey Department of Computer Science Rice University. Microsoft Manycore Workshop June 21, 2007. Panelists. David August - Princeton University
E N D
Programming Languages/Models and Compiler Technologies Moderator: John Mellor-Crummey Department of Computer Science Rice University Microsoft Manycore Workshop June 21, 2007
Panelists • David August - Princeton University • Saman Amarasinghe - Massachusetts Institute of Technology • Guy Blelloch - Carnegie Mellon University • Charles Leiserson - Massachusetts Institute of Technology • Uzi Vishkin - University of Maryland, College Park
Architectural Challenges • Significant parallelism • Multiple kinds of parallelism • cores • ILP • SIMD • Diversity of cores • Run-time throttling of cores for power mgmt • Memory hierarchy • bandwidth • near term: will continue to be a significant bottleneck • long term: 3D stacked memory? • long and often non-uniform memory latencies • scratch pads
Roles of Parallel Programming Models • Enhance programmer productivity through abstraction • Manage platform resources to deliver performance • Provide standard interface for platform portability
The Goal Simpler ways of conceptualizing, expressing, debugging, and tuning scalable parallel programs • Multiple models will be necessary • Models will necessarily trade off simplicity, expressivity, relevance to legacy code, and performance
To Succeed, Parallel Programming Models Must … • Be ubiquitous • cross platform • at a minimum: laptops, SMP servers • distributed memory clusters? • Be expressive • Be productive • easy to write • easy to read and maintain • easy to reuse • Have a promise of future availability and longevity • Be efficient • Be supported by tools
Simplifying Parallel Programming A high-level parallel language should … • Provide global address space • beware exposed buffering … • Separate concerns: partitioning, mapping, and synchronization vs. algorithm specification • “viscosity” comes from premature mingling of these issues • Enable programmer to manage locality at a high level • locality = performance • affinity between data and computation • e.g. HPF’s “ON HOME” declarations
Design Issues I • Ultimate control vs. simplicity of use • “library developers” vs. “productivity users” • should it be the same language for both? • extensible language model (Sun’s Fortress) • kitchen sink model (X10) • Implicit vs. explicit parallelism • implicit parallelism is often more malleable • better supports dynamic adaptation • Compiler assisted vs. compiler-centric • Co-array Fortran and UPC • user control over work decomposition, data movement, and synchronization • HPF: compiler must deliver or all is lost • Lazy vs. eager parallelism • Cilk’s lazy parallelism provides a model for “scalable” binaries • eager parallelism adds unnecessary overhead
Design Issues II • Deterministic vs. non-deterministic models • deterministic “clocked final model” • Saraswat et al. (www.saraswat.org/cf.pdf) • Static vs. dynamic scheduling • dynamic scheduling will be increasingly important • irregular computations, task parallelism • adaptive scheduling in response to “core throttling” • Cooperative vs. independent scheduling of work • does benefit of shared cache outweigh difficulty of using it? • tightly synchronous vs. more loosely synchronous • Scalable to distributed-memory ensembles? • broad community probably only cares about tightly-coupled platforms • some government and industry clients will always have extreme needs • Importance of managing affinity between cores and data • important for highest efficiency for library developers
Transactions are not “THE” Answer • Transactions are a piece of the puzzle: atomicity • Other aspects of the parallel programming problem • identifying concurrency • partitioning work • ordering actions
Autotuning • Seductive idea • Very successful as a library-based approach • FFTW, Atlas, OSKI, … • Much work needed to apply to applications rather than kernels • huge search space • progress in effective truncated search • model guidance can be effective • autotuning for parallelism • dangerously close to automatic parallelization
Rice Experience: Lessons from HPF • Good data and computation partitionings are essential • without good partitionings, parallelism suffers • flexible user-control is essential • Excess communication undermines scalability • both frequency and volume must be right • embrace user hints to guide communication placement and optimization • e.g. HPF/JA directives: REFLECT, LOCAL, PIPELINE, etc. • Single processor efficiency is critical • must use caches effectively on microprocessors • Icache: beware of complex machine-generated code • Dcache: beware of communication footprint • Optimizing tightly-coupled algorithms can be hard • if the compiler doesn’t optimize it, performance may be doomed!
Rice Experience: HPF vs. Co-array Fortran • Rice dHPF - a decade of investment in compiler technology • not quite, govt cut funding here too, just like architecture • polyhedral code generation models (like Lethin described) • Co-array Fortran for clusters • a few years effort by a pair of students • Result: Co-array Fortran bests HPF • more expressive • higher performance • shorter time to solution • currently, can be HARDER to program than MPI
Principal Compiler and Runtime Challenges • Exploiting multiple levels of heterogeneous parallelism • Choreographing parallelism, data movement, synchronization • Managing memory hierarchy • cache • scratch pad Warning: Don’t try this at home.
Programming Model Ecosystem Issues • Semantic mismatch between programming model and execution model • Debugging: data races and non-determinism • Performance analysis: why isn’t performance scaling • insufficient parallelism • parallelism is too fine grain to be efficient • architecture level issues, e.g., false sharing
A Path Forward • Kernel, benchmark, and application driven studies • assess strengths and weaknesses of models • Explore alternatives & evaluate their effects on • simplicity • expressiveness • correctness • performance