140 likes | 286 Views
Alternative and Experimental Parallel Programming Approaches CS433 Spring 2001. Laxmikant Kale. Parallel Programming models. We studied: MPI/Message passing, Shared Memory, Charm++/shared objs loop-parallel: openMP, Other languages/paradigms:
E N D
Alternative and Experimental Parallel Programming ApproachesCS433Spring 2001 Laxmikant Kale
Parallel Programming models • We studied: • MPI/Message passing, Shared Memory, Charm++/shared objs • loop-parallel: openMP, • Other languages/paradigms: • Loop parallelism on distributed memory machines: HPF • Linda, Cid, Chant • Several others: • Acceptance barrier • I will assign reading assignments: • papers on the above languages, available on the web. • Pointers on course web page soon.
Linda • Shared tuple space: • Specialization of shared memory • Operations: • read, in, out [eval] • Pattern matching ( in [2,x] -> reads x in, and removes tuple • Tuple analysis
Data driven execution • Systems that support user level threads • Cid, Chant, AMPI, CC++ • Systems that support DDE without threads; • Charm/Charm++, Active messages, split-C,
Cid • Derived from Id, a data-flow language • Basic constructs: • threads • create new threads • wait for data from other threads • User level vs. system level thread • What is a thread?: stack, PC, .. • Preemptive vs non-preemptive
Cid • Multiple threads on each processor • Benefits: adaptive overlap • Need a scheduler: use the OS scheduler? • All threads on one PE share address space • Thread mapping • At creation time, one may ask the system to map it to a PE • No migration after a thread starts running • Global pointers • Threads on different processors can exchange data via these • (In addition to fork/join data exchange)
Cid • Global pointers: • register any C structure as a global object (to get a globalID) • “get” operation gets a local copy of a given object • in read or write mode • asynchronous “get”s are also supported • get doesn’t wait for data to arrive • HPF style global arrays • Grainsize control • Especially for tree structured computations • Create a thread, if other processors are idle (for example)
Chant • Threads that send messages to each other • Message passing can be MPI style • User level threads • Simple implementation in Charm++ is available
CC++ • Allows parallelism within objects • Machine model: • At one level: Shared Address space constructs • On top: non-shared-memory constructs • SAS: creation of threads • par, parfor, spawn to create threads • E.g. parfor: for loop, where a separate system level thread is created ti execute each iteration • par: each subconstruct is executed by a separate thread in parallel • sync variables: • “sync” as a type modifier
CC++: non SAS features • Processor objects • global class X… • Encapsulates the idea of a processor • All global vbls used inside it refer to the local copy • I.e. no sharing of between global objects • Global pointers: • int *global x; • Is valid on other processors • Remote procedure calls
AMPI • Adaptive MPI • User-level threads, each running full-fledged MPI • Built on top of Charm++ • Supports measurement based load balancing • Thread migration. • How to migrate threads? • Challenges • Solutions
Multi-paradigm interoperabilty • Which one of these paradigms is “the best”? • Depends on the application, algorithm or module • Doesn’t matter anyway, as we must use MPI (openMP) • acceptance barrier • Idea: • allow multiple modules to be written in different paradigms • Difficulty: • Each paradigm has its own view of how to schedule processors • Comes down to scheduler • Solution: have a common scheduler
Converse • Common scheduler • Components for easily implementing new paradigms • User level threads • separates 3 functions of a thread package • message passing support • “Futures” (origin: Halstead: MultiLisp) • What is a “future” • data, ready-or-not, caller blocks on access • Several other features
CRL • Level one point