Understanding UPC: A Parallel Language for Programmer Control and Global Address Space

UPC at CRD/LBNL Kathy Yelick Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Christian Bell

What is UPC? • UPC is an explicitly parallel language • Global address space; can read/write remote memory • Programmer control over layout and scheduling • From Split-C, AC, PCP • Why a new language? • Easier to use than MPI, especially for program with complicated data structures • Possibly faster on some machines, but current goal is comparable performance p0 p1 p2

Background • UPC efforts elsewhere • IDA: Bill Carlson, UPC promoter • GMU (documentation) and UMC (benchmarking) • HP (Alpha cluster and C+MPI compiler (with MTU)) • Cray (implementations) • Intrepid (SGI and t3e compiler) • UPC Book: • T. El-Ghazawi, B. Carlson, T. Sterling, K. Yelick • 3 chapters in draft form; goal is to have proofs by SC03 • Three components of NERSC effort • Compilers for DOE machines (SP and PC clusters) • Runtime systems for ours and other compilers • Applications and benchmarks

UPC Funding • Base program funding K52004 • Compiler/translator work • Applications • Runtime for DOE machines • Part of Pmodels Center K52018 • Runtime support common to Titanium (and hopefully CoArray Fortran, at some point) • Collaboration with ARMCI group • NSA funding • UPC for “clusters”

Compiler Status • NERSC compiler/translator • Costin Iancu and Wei Chen • Translates UPC to C + “Berkeley UPC Runtime” • Based on Open64 compiler for C • Status • Complete in prototype form • Debugging, tuning, extensions ongoing • Release planned for next month: • Quadrics, Myrinet, IBM/SP, and MPI • Shared memory/process implementation is next • Investigating optimization opportunities • Communication optimizations • UPC language optimizations

UPC Compiler • Compiler based on Open64 • Multiple front-ends, including gcc • Intermediate form called WHIRL • Leverage standard optimizations and analyses • Pointer analysis • Loop optimizations • Current focus on C backend • IA64 possible in future • UPC Runtime built on GASNet • Portable • Language-independent UPC Higher WHIRL Optimizing transformations C + Runtime Lower WHIRL Assembly: IA64, MIPS,… + Runtime

Portable Runtime Support • Developing a runtime layer that can be easily ported and tuned to multiple architectures. Direct implementations of parts of full GASNet Runtime: Global pointers (opaque type with rich set of pointer operations), memory management, job startup, etc. Generic support for UPC, CAF, Titanium GASNet Extended API: Supports put, get, locks, barrier, bulk, scatter/gather GASNet Core API: Small interface based on “Active Messages” Core sufficient for functional implementation GASNet released 1/03

Communication Optimizations • Characterizing performance of current machines • Latency, overlap (communication & computation) • Plan to automatically optimization using communication performance model • Preliminary results: 10x improvement on Matmul

Performance without Communication

Preliminary Parallel Performance

Costs of Pointer-to-Shared Arithmetic – Berkeley vs. HP • HP is faster for most operations, since HP generates assembly code • Both compilers optimize for “phaseless” pointers • For some operations, Berkeley can beat the HP (ptr comparison) • Expect gap to narrow once the proper optimizations are built-in for Berkeley UPC

Applications • NAS Parallel Benchmark Sized Apps • UPC MG complete • UPC CG complete • UPC GUPS • GWU has done IS, EP, and FT • Planning on • Several Splash benchmarks • Sparse Cholesky • Possibly AMR

Mesh Generation • Parallel Mesh Generation in UPC • 2D Delaunay triangulation • Based on Triangle software by Shewchuk (UCB) • Parallel version from NERSC uses dynamic load balancing, software caching, and parallel sorting

Summary • Lots of progress on • Compiler • Runtime • Portable communication layer (GASNet) • Applications • Working on developing a large application that depends on UPC • Mesh generation • AMR (?), Sparse LU (?)

Future Plans • Runtime support for Intrepid • Gcc-based open source compiler • Performance tuning of runtime • Additional machines (Infiniband, X1, Dolphin) • Optimization of compiled code • Communication optimizations • Automatic search-based optimizations • Application efforts

Understanding UPC: A Parallel Language for Programmer Control and Global Address Space

Understanding UPC: A Parallel Language for Programmer Control and Global Address Space

Presentation Transcript

Max Zolotorev CBP AFRD LBNL

The Bro Intrusion Detection

LBNL Leo Greiner , Eric Anderssen ,

The Molecular Foundry at LBNL

LSWG September 2012

LBNL Leo Greiner , Eric Anderssen ,

LBNL Disbursements

LBNL Leo Greiner , Eric Anderssen ,

LBNL Budget System Proposal

SO(10) SUSY GUT in 5 D. September 21 @LBNL

Model Magnet Plan Update LARP Collaboration Meeting LBNL, April 26-28- 2006 Gian Luca Sabbi

Workflow Engines, Portals, and Gateways for Life Sciences

Perspective on LBNL-IHEP Damping Ring EDR Activities

Adaptive Mesh Refinement MHD

Nb 3 Sn Magnet Development at LBNL

Joseph Rasson LBNL DOE LHC Quarterly Status Meeting at FNAL 12 February 2004

OGF27 – GSM-WG roadmap

cca-forum Server Migration

ALICE at LBNL

Thanks to many, especially …. VERTIGO investigators at sea And … crew and captain of R/V Revelle

Scientific Data Management Center

M. Venturini (LBNL)