Pattern Programming

Pattern Programming ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson, 2012. Aug 30, 2012 PatternProg-1

Acknowledgment This work was initiated by Jeremy Villalobos and described in his PhD thesis: “RUNNING PARALLEL APPLICATIONS ON A HETEROGENEOUS ENVIRONMENT WITH ACCESSIBLE DEVELOPMENT PRACTICES AND AUTOMATIC SCALABILITY,” UNC-Charlotte, 2011.

Pattern Programming Research Group • 2011 • Jeremy Villalobos (PhD awarded, continuing involvement) • Saurav Bhattara (MS thesis, graduated) • Spring 2012 • Yawo Adibolo (ITCS 6880 Individual Study) • Ayay Ramesh (ITCS 6880 Individual Study) • Fall 2012 • Haoqi Zhao (MS thesis) • Pohua Lee (BS senior project) Openings!

Problem Addressed • To make parallel programming more useable and scalable. • Parallel programming, writing programs for solving problems using multiple computers, processors, and cores, has a very long history but still a challenge. • Traditional approach involve explicitly specifying message-passing (for clusters and distributed computers) and threads (for shared memory) with low-level APIs. • Need a better structured approach.

Pattern Programming Concept Programmer begins by constructing his program using established computational or algorithmic “patterns” that provide a structure. What patterns are we talking about? • Low-level algorithmic patterns that might be embedded into a program such as fork-join, broadcast/scatter/gather. • Higher level algorithm patterns for forming a complete program such as workpool, pipeline, stencil, map-reduce. • We concentrate upon higher-level “computational/algorithm ” level patterns rather than lower level patterns.

Some Patterns Workpool Workers Two-way connection Master Compute node Source/sink Derived from Jeremy Villalobos’s PhD thesis defense

Pipeline Stage 1 Stage 2 Stage 3 Workers One-way connection Two-way connection Master Compute node Source/sink

Divide and Conquer Two-way connection Divide Merge Compute node Source/sink

All-to-All Two-way connection Compute node Source/sink

Usually a synchronous computation - Performs number of iterations to converge on solution e.g. for solving Laplace’s/heat equation Stencil On each iteration, each node communicates with neighbors to get stored computed values Two-way connection Compute node Source/sink

Note on Terminology “Skeletons” Sometimes term “skeleton” used to describe “patterns”, especially directed acyclic graphs with a source, a computation, and a sink.We do not make that distinction and use the term “pattern” whether directed or undirected and whether acyclic or cyclic. This is done elsewhere.

Patterns • Advantages • Possible to create parallel code from the pattern specification automatically – see later. • Abstracts/hides underlying computing environment • Generally avoids deadlocks and race conditions • Reduces source code size (lines of code) • Disadvantages • New approach to learn • Takes away some of the freedom from programmer • Performance reduced (c.f. using high level languages instead of assembly language)

More Advantages/Notes • “Design patterns” part of software engineering for many years • Reusable solutions to commonly occurring problems * • Patterns provide guide to “best practices”, not a final implementation • Provides good scalable design structure to parallel programs • Can reason more easier about programs • Hierarchical designs with patterns embedded into patterns, and pattern operators to combine patterns. • Leads to an automated conversion into parallel programs without need to write with low level message-passing routines such as MPI. * http://en.wikipedia.org/wiki/Design_pattern_(computer_science)

Previous/Existing Work • Patterns/skeletons explored in several projects. • Universities: • University of Illinois at Urbana-Champaign and University of California, Berkeley • University of Torino/Università di Pisa Italy • ... • Industrial efforts • Intel • Microsoft • …

Universal Parallel Computing Research Centers (UPCRC) University of Illinois at Urbana-Champaign and University of California, Berkeley with Microsoft and Intel in 2008 (with combined funding of at least $35 million). Co-developed OPL (Our Pattern Language). Group of twelve computational patterns identified: • Finite State Machines • Circuits • Graph Algorithms • Structured Grid • Dense Matrix • Sparse Matrix in seven general application areas

Intel Focused on very low level patterns such as fork-join, and provides constructs for them in: • Intel Threading Building Blocks (TBB) • Template library for C++ to support parallelism • Intel Cilk plus • Compiler extensions for C/C++ to support parallelism • Intel Array Building Blocks (ArBB) • Pure C++ library-based solution for vector parallelism Above are somewhat competing tools obtained through takeovers of small companies. Each implemented differently.

New book 2012 from Intel authors “Structured Parallel Programming: Patterns for Efficient Computation,” Michael McCool, James Reinders, Arch Robison, Morgan Kaufmann, 2012 Focuses on Intel tools

Using patterns with Microsoft C# http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=19222 Again very low-level with patterns such as parallel for loops.

Closest to our work http://calvados.di.unipi.it/dokuwiki/doku.php?id=ffnamespace:about University of Torino, Italy /Università di Pisa

Our approach(Jeremy Villalobos’ UNC-C PhD thesis) Focuses on a few patterns of wide applicability (e.g. workpool, synchronous all-to-all, pipelined, stencil) but Jeremy took it much further than UPCRC and Intel. He developed a higher-level framework called “Seeds” Uses pattern approach to automatically distribute code across processor cores, computers, or geographical distributed computers and execute the parallel code.

“Seeds” Parallel Grid Application Framework • Some Key Features • Pattern-programming • (Java) user interface • Self-deploys on computers, clusters, and geographically distributed computers • Load balances • Three levels of user interface http://coit-grid01.uncc.edu/seeds/

Seeds Development Layers • Basic • Intended for programmers that have basic parallel computing background • Based on skeletons and patterns • Advanced: Used to add or extend functionality such as: • Create new patterns • Optimize existing patterns or • Adapt existing pattern to non-functional requirements specific to the application • Expert: Used to provide basic services: • Deployment • Security • Communication/Connectivity • Changes in the environment Derived from Jeremy Villalobos’s PhD thesis defense

Deployment • Different ways implemented during PhD work • Deployment with SSH • now preferred

Basic User Programmer Interface • To create and execute parallel programs, programmer selects a pattern and implements three principal Java methods: • Diffuse method – to distribute pieces of data. • Compute method – the actual computation • Gather method – used to gather the results • Programmer also has to fill in details in a “bootstrap” class to deploy and start the framework. Diffuse Compute Gather Bootstrap class The framework self-deploys on a geographically distributed platform and executes pattern.

public Data Compute (Data data) { // input gets the data produced by DiffuseData() DataMap<String, Object> input = (DataMap<String,Object>)data; // output will emit the partial answers done by this method DataMap<String, Object> output = new DataMap<String, Object>(); Long seed = (Long) input.get("seed"); // get random seed Random r = new Random(); r.setSeed(seed); Long inside = 0L; for (int i = 0; i < DoubleDataSize ; i++) { double x = r.nextDouble(); double y = r.nextDouble(); double dist = x * x + y * y; if (dist <= 1.0) { ++inside; } } output.put("inside", inside);// store partial answer to return to GatherData() return output; } public Data DiffuseData (int segment) { DataMap<String, Object> d =new DataMap<String, Object>(); d.put("seed", R.nextLong()); return d; // returns a random seed for each job unit } public void GatherData (int segment, Data dat) { DataMap<String,Object> out = (DataMap<String,Object>) dat; Long inside = (Long) out.get("inside"); total += inside; // aggregate answer from all the worker nodes. } public double getPi() { // returns value of pi based on the job done by all the workers double pi = (total / (random_samples * DoubleDataSize)) * 4; return pi; } public int getDataCount() { return random_samples; } } Complete code (Monte Carlo pi in Assignment 1, see later for more details) Computation package edu.uncc.grid.example.workpool; import java.util.Random; import java.util.logging.Level; import edu.uncc.grid.pgaf.datamodules.Data; import edu.uncc.grid.pgaf.datamodules.DataMap; import edu.uncc.grid.pgaf.interfaces.basic.Workpool; import edu.uncc.grid.pgaf.p2p.Node; public class MonteCarloPiModule extends Workpool { private static final long serialVersionUID = 1L; private static final int DoubleDataSize = 1000; double total; int random_samples; Random R; public MonteCarloPiModule() { R = new Random(); } @Override public void initializeModule(String[] args) { total = 0; Node.getLog().setLevel(Level.WARNING); // reduce verbosity for logging random_samples = 3000; // set number of random samples } Note: No explicit message passing

Bootstrap class This code deploys framework and starts execution of pattern package edu.uncc.grid.example.workpool; import java.io.IOException; import net.jxta.pipe.PipeID; import edu.uncc.grid.pgaf.Anchor; import edu.uncc.grid.pgaf.Operand; import edu.uncc.grid.pgaf.Seeds; import edu.uncc.grid.pgaf.p2p.Types; public class RunMonteCarloPiModule { public static void main(String[] args) { try { MonteCarloPiModule pi = new MonteCarloPiModule(); Seeds.start( "/path/to/seeds/seed/folder" , false); PipeID id = Seeds.startPattern(new Operand( (String[])null, new Anchor( "hostname" , Types.DataFlowRoll.SINK_SOURCE), pi ) ); System.out.println(id.toString() ); Seeds.waitOnPattern(id); System.out.println( "The result is: " + pi.getPi() ) ; Seeds.stop(); } catch (SecurityException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } catch (Exception e) { e.printStackTrace(); } } } Different patterns have similar code

Compiling/executing • Can be done on the command line (ant script provided) or through an IDE (Eclipse)

http://coit-grid01.uncc.edu/seeds/ Tutorial page

Next step • Assignment 1 – using the Seeds framework

Questions

Pattern Programming

Pattern Programming

Presentation Transcript

Pattern Programming Seeds Framework Notes on Assignment 1

A Pattern Language for Parallel Programming

Pattern

CSE341: Programming Languages Lecture 5 Pattern-Matching

Pattern

PATTERN

Application of Genetic Programming for Multicategory Pattern Classification

Pattern Language for Adaptive Programming (AP)

Pattern Programming Introduction to Seeds Framework

Pattern:

Pattern Language for Adaptive Programming (AP)

PATTERN

Pattern

THROW Pattern PUSH Pattern

PATTERN

THROW Pattern PUSH Pattern

Pattern Language for Adaptive Programming (AP)

Pattern Parallel Programming

Adapter Design Pattern State Design Pattern

Application of Genetic Programming for Multicategory Pattern Classification

Observer Pattern Mediator Pattern