150 likes | 285 Views
UPC at CRD/LBNL. Kathy Yelick Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Christian Bell. What is UPC?. UPC is an explicitly parallel language Global address space; can read/write remote memory
E N D
UPC at CRD/LBNL Kathy Yelick Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Christian Bell
What is UPC? • UPC is an explicitly parallel language • Global address space; can read/write remote memory • Programmer control over layout and scheduling • From Split-C, AC, PCP • Why a new language? • Easier to use than MPI, especially for program with complicated data structures • Possibly faster on some machines, but current goal is comparable performance p0 p1 p2
Background • UPC efforts elsewhere • IDA: Bill Carlson, UPC promoter • GMU (documentation) and UMC (benchmarking) • HP (Alpha cluster and C+MPI compiler (with MTU)) • Cray (implementations) • Intrepid (SGI and t3e compiler) • UPC Book: • T. El-Ghazawi, B. Carlson, T. Sterling, K. Yelick • 3 chapters in draft form; goal is to have proofs by SC03 • Three components of NERSC effort • Compilers for DOE machines (SP and PC clusters) • Runtime systems for ours and other compilers • Applications and benchmarks
UPC Funding • Base program funding K52004 • Compiler/translator work • Applications • Runtime for DOE machines • Part of Pmodels Center K52018 • Runtime support common to Titanium (and hopefully CoArray Fortran, at some point) • Collaboration with ARMCI group • NSA funding • UPC for “clusters”
Compiler Status • NERSC compiler/translator • Costin Iancu and Wei Chen • Translates UPC to C + “Berkeley UPC Runtime” • Based on Open64 compiler for C • Status • Complete in prototype form • Debugging, tuning, extensions ongoing • Release planned for next month: • Quadrics, Myrinet, IBM/SP, and MPI • Shared memory/process implementation is next • Investigating optimization opportunities • Communication optimizations • UPC language optimizations
UPC Compiler • Compiler based on Open64 • Multiple front-ends, including gcc • Intermediate form called WHIRL • Leverage standard optimizations and analyses • Pointer analysis • Loop optimizations • Current focus on C backend • IA64 possible in future • UPC Runtime built on GASNet • Portable • Language-independent UPC Higher WHIRL Optimizing transformations C + Runtime Lower WHIRL Assembly: IA64, MIPS,… + Runtime
Portable Runtime Support • Developing a runtime layer that can be easily ported and tuned to multiple architectures. Direct implementations of parts of full GASNet Runtime: Global pointers (opaque type with rich set of pointer operations), memory management, job startup, etc. Generic support for UPC, CAF, Titanium GASNet Extended API: Supports put, get, locks, barrier, bulk, scatter/gather GASNet Core API: Small interface based on “Active Messages” Core sufficient for functional implementation GASNet released 1/03
Communication Optimizations • Characterizing performance of current machines • Latency, overlap (communication & computation) • Plan to automatically optimization using communication performance model • Preliminary results: 10x improvement on Matmul
Costs of Pointer-to-Shared Arithmetic – Berkeley vs. HP • HP is faster for most operations, since HP generates assembly code • Both compilers optimize for “phaseless” pointers • For some operations, Berkeley can beat the HP (ptr comparison) • Expect gap to narrow once the proper optimizations are built-in for Berkeley UPC
Applications • NAS Parallel Benchmark Sized Apps • UPC MG complete • UPC CG complete • UPC GUPS • GWU has done IS, EP, and FT • Planning on • Several Splash benchmarks • Sparse Cholesky • Possibly AMR
Mesh Generation • Parallel Mesh Generation in UPC • 2D Delaunay triangulation • Based on Triangle software by Shewchuk (UCB) • Parallel version from NERSC uses dynamic load balancing, software caching, and parallel sorting
Summary • Lots of progress on • Compiler • Runtime • Portable communication layer (GASNet) • Applications • Working on developing a large application that depends on UPC • Mesh generation • AMR (?), Sparse LU (?)
Future Plans • Runtime support for Intrepid • Gcc-based open source compiler • Performance tuning of runtime • Additional machines (Infiniband, X1, Dolphin) • Optimization of compiled code • Communication optimizations • Automatic search-based optimizations • Application efforts