300 likes | 314 Views
Explore how pattern programming structures parallel programs using reusable solutions, guide best practices, and automatic code conversion. Learn about key patterns like workpool, pipeline, stencil, and map-reduce, and the Seeds Framework for distributed computing. Enhance your parallel computing skills with this innovative approach!
E N D
Pattern Programming ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson, 2013. August 29A, 2013 PatternProg-1
Problem Addressed • To make parallel programming more useable and scalable. • Parallel programming -- writing programs to use multiple computers and processors collectively to solve problems -- has a very long history but still a challenge. 2
Traditional approach • Traditional approach • Explicitly specifying message-passing (MPI) and • Explicitly using low-level threads APIs (Pthreads, Java threads, OpenMP, …). • Need a better structured approach. 3
Pattern Programming Concept Programmer begins by constructing his program using established computational or algorithmic “patterns” that provide a structure. “Design patterns” part of software engineering for many years: • Reusable solutions to commonly occurring problems * • Provide guide to “best practices”, not a final implementation • Provides good scalable design structure • Can reason more easier about programs • Potential for automatic conversion into executable code avoiding low-level programming – We do that here. • Particularly useful for the complexities of parallel/distributed computing * http://en.wikipedia.org/wiki/Design_pattern_(computer_science)
In Parallel/Distributed Computing What patterns are we talking about? • Low-level algorithmic patterns that might be embedded into a program such as fork-join, broadcast/scatter/gather. • Higher level algorithm patterns for forming a complete program such as workpool, pipeline, stencil, map-reduce. We concentrate upon higher-level “computational/algorithm ” level patterns rather than lower level patterns.
Some Patterns Workpool Workers Two-way connection Master Compute node Source/sink
Pipeline Stage 1 Stage 2 Stage 3 Workers One-way connection Two-way connection Master Compute node Source/sink
Divide and Conquer Two-way connection Divide Merge Compute node Source/sink
All-to-All All compute nodes can communicate with all the other nodes Usually a synchronous computation - Performs number of iterations to obtain on solution e.g. N-body problem Two-way connection Compute node Source/sink Master
Stencil All compute nodes can communicate with only neighboring nodes On each iteration, each node communicates with neighbors to get stored computed values Usually a synchronous computation - Performs number of iterations to converge on solution, e.g. solving Laplace’s/heat equation Two-way connection Compute node Source/sink
Parallel Patterns -- Advantages • Abstracts/hides underlying computing environment • Generally avoids deadlocks and race conditions • Reduces source code size (lines of code) • Leads to automated conversion into parallel programs without need to write with low level message-passing routines such as MPI. • Hierarchical designs with patterns embedded into patterns, and pattern operators to combine patterns. Disadvantages • New approach to learn • Takes away some of the freedom from programmer • Performance reduced (c.f. using high level languages instead of assembly language)
Previous/Existing Work Patterns explored in several projects. • Industrial efforts • Intel Threading Building Blocks (TBB), Intel Cilk plus, Intel Array Building Blocks (ArBB). Focus on very low level patterns such as fork-join • Universities: • University of Illinois at Urbana-Champaign and University of California, Berkeley • University of Torino/Università di Pisa Italy
Book by Intel authors “Structured Parallel Programming: Patterns for Efficient Computation,” Michael McCool, James Reinders, Arch Robison, Morgan Kaufmann, 2012 Focuses on Intel tools
Note on Terminology “Skeletons” Sometimes term “skeleton” used to describe “patterns”, especially directed acyclic graphs with a source, a computation, and a sink.We do not make that distinction and use the term “pattern” whether directed or undirected and whether acyclic or cyclic. This is done elsewhere.
Our approach(Jeremy Villalobos’ UNC-C PhD thesis) Focuses on a few patterns of wide applicability (e.g. workpool, synchronous all-to-all, pipelined, stencil) but Jeremy took it much further than UPCRC and Intel. He developed a higher-level framework called “Seeds” Uses pattern approach to automatically distribute code across processor cores, computers, or geographical distributed computers and execute the parallel code.
“Seeds” Parallel Grid Application Framework • Some Key Features • Pattern-programming • Java user interface • (C++ version in development) • Self-deploys on computers, clusters, and geographically distributed computers http://coit-grid01.uncc.edu/seeds/
Seeds Development Layers • Basic • Intended for programmers that have basic parallel computing background • Based on skeletons and patterns • Advanced: Used to add or extend functionality such as: • Create new patterns • Optimize existing patterns or • Adapt existing pattern to non-functional requirements specific to the application • Expert: Used to provide basic services: • Deployment • Security • Communication/Connectivity • Changes in the environment Derived from Jeremy Villalobos’s PhD thesis defense
Basic User Programmer Interface Programmer selects a pattern and implements three principal Java methods with a module class: • Diffuse method – to distribute pieces of data. • Compute method – the actual computation • Gather method – used to gather the results Programmer also has to fill details in a “run module” bootstrap class that creates an instance of the module class and starts the framework. “Module” class Diffuse Compute Gather “Run module” bootstrap class Framework then self-deploys on a specified parallel/distributed computing platform and executes pattern.
public Data Compute (Data data) { // input gets the data produced by DiffuseData() DataMap<String, Object> input = (DataMap<String,Object>)data; DataMap<String, Object> output = new DataMap<String, Object>(); Long seed = (Long) input.get("seed"); // get random seed Random r = new Random(); r.setSeed(seed); Long inside = 0L; for (int i = 0; i < DoubleDataSize ; i++) { double x = r.nextDouble(); double y = r.nextDouble(); double dist = x * x + y * y; if (dist <= 1.0) { ++inside; } } output.put("inside", inside);// store partial answer to return to GatherData() return output; // output will emit the partial answers done by this method } public Data DiffuseData (int segment) { DataMap<String, Object> d =new DataMap<String, Object>(); d.put("seed", R.nextLong()); return d; // returns a random seed for each job unit } public void GatherData (int segment, Data dat) { DataMap<String,Object> out = (DataMap<String,Object>) dat; Long inside = (Long) out.get("inside"); total += inside; // aggregate answer from all the worker nodes. } public double getPi() { // returns value of pi based on the job done by all the workers double pi = (total / (random_samples * DoubleDataSize)) * 4; return pi; } public int getDataCount() { return random_samples; } } Example module class Complete code (Monte Carlo pi in Assignment 1, see later for more details) Computation package edu.uncc.grid.example.workpool; import java.util.Random; import java.util.logging.Level; import edu.uncc.grid.pgaf.datamodules.Data; import edu.uncc.grid.pgaf.datamodules.DataMap; import edu.uncc.grid.pgaf.interfaces.basic.Workpool; import edu.uncc.grid.pgaf.p2p.Node; public class MonteCarloPiModule extends Workpool { private static final long serialVersionUID = 1L; private static final int DoubleDataSize = 1000; double total; int random_samples; Random R; public MonteCarloPiModule() { R = new Random(); } public void initializeModule(String[] args) { total = 0; Node.getLog().setLevel(Level.WARNING); // reduce verbosity for logging random_samples = 3000; // set number of random samples } Note: No explicit message passing
Seeds Implementations • Three Java versions available (2013): • Full JXTA P2P version requiring an Internet connection • JXTA P2P version but not needing an external network, suitable for a single computer • Multicore (thread-based) version for operation on a single computer • Multicore version much faster execution on single computer. Only difference is minor change in bootstrap class.
Bootstrap classJXTA P2P version This code deploys framework and starts execution of pattern package edu.uncc.grid.example.workpool; import java.io.IOException; import net.jxta.pipe.PipeID; import edu.uncc.grid.pgaf.Anchor; import edu.uncc.grid.pgaf.Operand; import edu.uncc.grid.pgaf.Seeds; import edu.uncc.grid.pgaf.p2p.Types; public class RunMonteCarloPiModule { public static void main(String[] args) { try { MonteCarloPiModule pi = new MonteCarloPiModule(); Seeds.start( "/path/to/seeds/seed/folder" , false); PipeID id = Seeds.startPattern(new Operand( (String[])null, new Anchor("hostname", Types.DataFlowRoll.SINK_SOURCE), pi )); System.out.println(id.toString() ); Seeds.waitOnPattern(id); Seeds.stop(); System.out.println( "The result is: " + pi.getPi() ) ; } catch (SecurityException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (Exception e) { e.printStackTrace(); } } } Different patterns have similar code
Bootstrap classMulticore version public class RunMonteCarloPiModule { public static void main(String[] args) { try { MonteCarloPiModule pi=new MonteCarloPiModule(); Thread id = Seeds.startPatternMulticore( new Operand( (String[])null, new Anchor( args[0], Types.DataFlowRole.SINK_SOURCE), pi ),4); id.join(); System.out.println( "The result is: " + pi.getPi() ) ; } catch (SecurityException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (Exception e) { e.printStackTrace(); } } } • Multicore version • Much faster on a multicore platform • Thread based • Bootstrap class does not need to start and stop JXTA P2P. Seeds.start() and Seeds.stop() not needed. Otherwise user code similar.
Measuring Time Can instrument code in the bootstrap class: public class RunMyModule { public static void main (String [] args ) { try{ long start = System.currentTimeMillis(); MyModule m = new MyModule(); Seeds.start(. ); PipeID id = ( … ); Seeds.waitOnPattern(id); Seeds.stop(); long stop = System.currentTimeMillis(); double time = (double) (stop - start) / 1000.0; System.out.println(“Execution time = " + time); } catch (SecurityException e) { … …
Compiling/executing • Can be done on the command line (ant script provided) or through an IDE (Eclipse)
http://coit-grid01.uncc.edu/seeds/ Tutorial page
Acknowledgements Work initiated by Jeremy Villalobos in his PhD thesis “Running Parallel Applications on a Heterogeneous Environment with Accessible Development Practices and Automatic Scalability,” UNC-Charlotte, 2011. Jeremy developed “Seeds” pattern programming software. Extending work to teaching environment supported by the National Science Foundation under grant "Collaborative Research: Teaching Multicore and Many-Core Programming at a Higher Level of Abstraction" #1141005/1141006 (2012-2015). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
UNC-Charlotte Pattern Programming Research Grouphttp://coitweb.uncc.edu/~abw/PatternProgGroup/ Fall 2013 • Jeremy Villalobos (PhD awarded, continuing involvement) PhD student • YasamanKamyabHessary (Course TA) CS MS students • Haoqi Zhao (MS thesis) • YawoAdibolo developed C++ version of framework software for interest. CS BS student • Matthew Edge (Senior project) • Kevin Silliman (Senior project evaluating Yawo’s C++ framework) Please contact B. Wilkinson if you would like to be involved in this work for academic credit
Next step • Assignment 1 – using the Seeds framework