110 likes | 132 Views
The Berkeley UPC Compiler. Wei Chen The LBNL/Berkeley UPC Group. Unified Parallel C (UPC). UPC is a parallel extension to C for scientific computing With distributed arrays, shared pointers, parallel loops, strict/relaxed memory model. Global Address Space Abstraction SPMD parallelism
E N D
The Berkeley UPC Compiler Wei Chen The LBNL/Berkeley UPC Group
Unified Parallel C (UPC) • UPC is a parallel extension to C for scientific computing • With distributed arrays, shared pointers, parallel loops, strict/relaxed memory model. • Global Address Space Abstraction • SPMD parallelism • There are vendor compilers on several machines • HP Alpha Server, Cray, Sun, SGI • Open source compiler developed by LBNL/UCB (beta release 3/31)
Overview of Berkeley UPC Compiler UPC Code Translator Open64 based Platform- independent Translator Generated C Code Network- independent Berkeley UPC Runtime System Compiler- independent GASNet Communication System Language- independent Network Hardware Two Goals: Portability and High-Performance
Implementing the UPC to C Translator Preprocessed File • Source to source translation • Ported to gcc 3.2 (done by Rice Open64) • Supports both 32/64 bit platforms • Designed to incorporate existing • optimization framework (currently not enabled) • Communicate with runtime via a standard API and configuration files UPC front end VH Whirl w/ shared types Backend lowering High Whirl w/ runtime calls Whirl2c ANSI-compliant C Code
Components in the Translator • Front end: • UPC extensions to C: shared qualifier, block size, forall loops, builtin functions and values (blocksizeof, localsizeof, etc.), strict/relaxed • Parses and type-checks UPC code, generates Whirl, with UPC-specific information available in symbol table • Backend: • Transform shared read and writes into calls into runtime library (after LNO on H whirl). • Calls can be blocking/non-blocking/bulk/register-based • Whirl2c: • Shared variables are declared as opaque pointer-to-shared • For static shared variables, allocate and initialize them dynamically
Modifications • Symbol Table • Add flags for shared, strict/relaxed, and block size for TY_TAB • Intrinsics • Each UPC runtime function is represented by a new intrinsic (about 100 of them) • Driver • Use sgiupc to compile UPC programs • New flags for passing config file, number of threads • C front end • Modify gccfe/gnu to parse upc extensions, also fixes for ANSI-compliance • Modify gccfe to support upc_forall loops (transformed to WHILE_DO, marked by pragma) • Name mangling for static variables
Modifications II • Backend • Add new lowering phases for transforming shared accesses • Use some VH Whirl (e.g. comma to spill return value) • Adjust field offsets for structs that have shared pointers (also in front end for sizeof) • Symbol table not consistent till lowering finishes • Dynamic nesting of forall loops • Whirl2c • Various UPC-specific changes and bug fixes • Access thread-local data through macros • Dynamically allocate static user data
Future Work • Add UPC-specific optimizations • Possibly as a new phase • Likely will use/modify PREOPT and LNO (alias analysis, dependence analysis, prefetching) • Want WOPT too -- possible to extend whirl2c to work for M Whirl? • Coordination Among Releases • Our version has been merged with the Rice Open64 project • Would like to merge with either Open64 or ORC • One common CVS tree, with each team on different branches?
UPC Programming Model Features • SPMD parallelism • fixed number of images during execution • images operate asynchronously • Several kinds of array distributions • double a[n] a private array on each processor • shared double a[n] a shared array, with cyclic mapping • shared [4] double a[n] a block cyclic array with 4-element blocks • shared [0] double *a = (shared [0] double *) upc_alloc(n); a shared array with all elements local • Pointers for irregular data structures • shared double *sp a pointer to shared data • double *lp a pointers to private data
Parallel Loops in UPC • UPC has a “forall” construct for distributing computation Ex: Vector Addition shared int v1[N], v2[N], v3[N]; upc_forall (i=0; i < N; i++; &v3[i]) { v3[i] = v2[i] + v1[i]; } • Two kinds of affinity expressions: • Integer (compare with thread id) • Shared address (check the affinity of address) • Affinity tests are performed on every iteration Affinity Exp