300 likes | 428 Views
Charm++ Data-driven Objects. L. V. Kale. Parallel Programming. Decomposition what to do in parallel Mapping: Which processor does each task Scheduling (sequencing) On each processor Machine dependent expression Express the above decisions for the particular parallel machine.
E N D
Charm++Data-driven Objects L. V. Kale
Parallel Programming • Decomposition • what to do in parallel • Mapping: • Which processor does each task • Scheduling (sequencing) • On each processor • Machine dependent expression • Express the above decisions for the particular parallel machine The parallel objects model of Charm++ automates Mapping, Scheduling, and machine dependent expression
Shared objects model: • Basic philosophy: • Let the programmer decide what to do in parallel • Let the system handle the rest: • Which processor executes what, and when • With some override control to the programmer, when needed • Basic model: • The program is set of communicating objects • Objects only know about other objects (not processors) • System maps objects to processors • And may remap the objects for load balancing etc. dynamically • Shared objects, not shared memory • in-between “shared nothing” message passing, and “shared everything” of SAS • Additional information sharing mechanisms • “Disciplined” sharing
Charm++ • Charm++ programs specify parallel computations consisting of a number of “objects” • How do they communicate? • By invoking methods on each other, typically asynchronously • Also by sharing data using “specifically shared variables” • What kinds of objects? • Chares: singleton objects • Chare arrays: generalized collections of objects • Advanced: Chare group (Used by library writers, system)
Data Driven Execution in Charm++ Objects Scheduler Scheduler Message Q Message Q
Need for Proxies • Consider: • Object x of class A wants to invoke method f of obj y of class B. • x and y are on different processors • what should the syntax be? • y->f( …)? : doesn’t work because y is not a local pointer • Needed: • Instead of “y” we must use an ID that is valid across processors • Method Invocation should use this ID • Some part of the system must pack the parameters and send them • Some part of the system on the remote processor must invoke the right method on the right object with the parameters supplied
Charm++ solution: proxy classes • Classes with remotely invokeable methods • inherit from “chare” class (system defined) • entry methods can only have one parameter: a subclass of message • For each chare class D • which has methods that we want to remotely invoke • The system will automatically generate a proxy class Cproxy_D • Proxy objects know where the real object is • Methods invoked on this class simply put the data in an “envelope” and send it out to the destination • Each chare object has a proxy • CProxy_D thisProxy; // thisProxy inherited from “CBase_D” • Also you can get a proxy for a chare when you create it: • CProxy_D myNewChare = CProxy_D::ckNew(arg);
Chare creation and method invocation CProxy_D x = CProxy_D::ckNew(25); x.f(5,7); Sequential equivalent: y = new D(25); y->f(5,7);
Chares (Data driven Objects) • Regular C++ classes, • with some methods designated as remotely invokable (called entrymethods) • Creation: of an instance of chare class C • CProxy_C myChareProxy = CProxy_C::ckNew(args); • Creates an instance of C on a specified processor “pe” • CProxy_C::ckNew (args, pe); • Cproxy_C: a proxy class generated by Charm for chare class C declared by the user
Remote method invocation • Proxy Classes: • For each chare class C, the system generates a proxy class. • (C : CProxy_C) • Global: in the sense of being valid on all processors • thisProxy (analogous to this) gets you your own proxy • You can send proxies in messages • Given a proxy p, you can invoke methods: • p.method(msg);
CProxy_main mainProxy; main::main(CkArgMsg * m) { int i = 0; for (i=0; i<100; i++) new CProxy_piPart(); responders = 100; count = 0; mainProxy = thisProxy; // readonly initialization } void main::results(int pcount) { count += pcount; if (0 == --responders) { cout << "pi=: “ << 4.0*count/100000 << endl; CkExit(); } } Executionbegins here argc/argv Exit the program
piPart::piPart() { // declarations.. srand48((long) this); mySamples = 100000/100; for (i= 0; i<= mySamples; i++) { x = drand48(); y = drand48(); if ((x*x + y*y) <= 1.0) localCount++; } mainProxy.results(localCount); delete this; }
Generation of proxy classes • How does charm generate the proxy classes? • Needs help from the programmer • name classes and methods that can be remotely invoked • declare this in a special “charm interface” file (pgm.ci) • Include the generated code in your program pgm.ci mainmodule PiMod { mainchare main { entry main(); entry results(int pc); }; chare piPart { entry piPart(void); }; pgm.h #include “PiMod.decl.h” .. Generates PiMod.def.h PiMod.def.h Pgm.c … #include “PiMod.def.h”
Charm++ • Data Driven Objects • Message classes • Asynchronous method invocation • Prioritized scheduling • Object Arrays • Object Groups: • global object with a “representative” on each PE • Information sharing abstractions • readonly data • accumulators • distributed tables
Object Arrays • A collection of chares, • with a single global name for the collection, and • each member addressed by an index • Mapping of element objects to processors handled by the system User’s view A[0] A[1] A[2] A[3] A[..] System view A[0] A[3]
Introduction • Elements are parallel objects like chares • Elements are indexed by a user-defined data type-- [sparse] 1D, 2D, 3D, tree, ... • Send messages to index, receive messages at element. Reductions and broadcasts across the array • Dynamic insertion, deletion, migration-- and everything still has to work! • Interfaces with automatic load balancer.
1D Declare & Use module m{ array [1D] Hello { entry Hello(void); entry void SayHi(int HiData); }; }; In the interface (.ci) file In the .C file //Create an array of Hello’s with 4 elements: int nElements=4; CProxy_Hello p = CProxy_Hello::ckNew(nElements); //Have element 2 say “hi” P[2].SayHi(12345);
1D Definition class Hello:public CBase_Hello{ public: Hello(void) { … thisProxy … … thisIndex … } void SayHi(int m) { if (m <1000) thisProxy[thisIndex+1].SayHi(m+1); } Hello(CkMigrateMessage *m) {} }; Inherited from ArrayElement1D
3D Declare & Use module m{ array [3D] Hello { entry Hello(void); entry void SayHi(int HiData); }; }; CProxy_Hello p= CProxy_Hello::ckNew(); for (int i=0;i<800000;i++) p(x(i),y(i),z(i)).insert(); p.doneInserting(); p(12,23,7).SayHi( 34);
3D Definition class Hello:public CBase_Hello{ public: Hello(void) { ... thisProxy ... ... thisIndex.x, thisIndex.y, thisIndex.z ... } void SayHi(int HiData) { ... } Hello(CkMigrateMessage *m) {} };
Pup Routine void pup(PUP::er &p) { // Call our superclass’s pup routine: ArrayElement3D::pup(p); p|myVar1;p|myVar2; ... }
Generalized “arrays”: Declare & Use module m{ array [Foo] Hello { entry Hello(void); entry void SayHi(int data); }; }; CProxy_Hello p= CProxy_Hello::ckNew(); for (...) p[CkArrayIndexFoo(..)].insert(); p.doneInserting(); p[CkArrayIndexFoo(..)].SayHi(..);
General Definition class CkArrayIndexFoo: public CkArrayIndex { Barb; //char b[8]; float b[2];.. public: CkArrayIndexFoo(...) {... nInts=sizeof(b)/sizeof(int); } }; class Hello:public CBase_Hello { public: Hello(void) { ... thisIndex ...
Collective ops Broadcast message SayHi: p.SayHi(data); Reduce x across all elements: contribute(sizeof(x),&x,CkReduction::sum_int,cb); Where do reduction results go? To a “callback” function, named cb above: // Call some function foo with fooData when done: CkCallback cb(foo,fooData); // Broadcast the results to my method “bar” when done: CkCallback cb(CkIndex_MyArray::bar,thisProxy);
Migration support Delete element i: p[i].destroy(); Migrate to processor destPe: migrateMe(destPe); Enable load balancer: by creating a load balancing object Provide pack/unpack functions: Each object that needs this, provides a “pup” method. (pup is a single abstraction that allows data traversal for determining size, packing and unpacking)
Object Groups • A group of objects (chares) • with exactly one representative on each processor • A single proxy for the group as a whole • invoke methods in a branch (asynchronously), all branches (broadcast), or in the local branch • creation: • agroup = Cproxy_C::ckNew(msg) • remote invocation: • p.methodName(msg); // p.methodName(msg, peNum); • p.ckLocalBranch()->f(….);
Information sharing abstractions • Observation: • Information is shared in several specific modes in parallel programs • Other models support only a limited sets of modes: • Shared memory: everything is shared: sledgehammer approach • Message passing: messages are the only method • Charm++: identifies and supports several modes • Readonly / writeonce • Tables (hash tables) • accumulators • Monotonic variables
Compiling Charm++ programs • Need to define an interface specification file • mod.ci for each module mod • Contains declarations that the system uses to produce proxy classes • These produced classes must be included in your mod.C file • See examples provided on the class web site. • More information: • Manuals, example programs, papers • http://charm.cs.uiuc.edu/ • These slides are currently at: • http://charm.cs.uiuc.edu/presentations/charmTutorial/
Fortran 90 version • Quick implementation on top of Charm++ • How to use: • follow example program, with the same basic concepts • Only use object arrays, for now • Most useful construct • Object groups can be implemented in C++, if needed
Further Reading • More information: • Manuals, example programs, papers • http://charm.cs.uiuc.edu • These slides are currently at: • http://charm.cs.uiuc.edu/kale/cse320