270 likes | 665 Views
Pattern Programming Seeds Framework Notes on Assignment 1. ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson, 2012. August 30, 2012 PatternProg-2. Seeds Workpool DiffuseData, Compute, and GatherData Methods. GatherData. DiffuseData. Private variable total (answer).
E N D
Pattern ProgrammingSeeds Framework Notes on Assignment 1 ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson, 2012. August 30, 2012 PatternProg-2
Seeds WorkpoolDiffuseData, Compute, and GatherData Methods GatherData DiffuseData Private variable total (answer) DataMap d Master Returns d to each slave Data argument dat Compute Note DiffuseData, Compute and GatherData methods start with a capital letter although method names should not! Data argument data DataMap input Slaves DataMap output DataMap d created in diffuse DataMap output created in compute
Data and DataMap classes • For implementation convenience two classes: • Data class used to pass data between master and slaves Uses a “segment” number to keep track of packets • as they go from one method to another. • DataMap class inside compute method • DataMapis a subclass of Data and so allows casting. DataMap methods • put (String, data) – puts data into DataMap identified by string • get (String, data) – gets stored data identified by string In the pi code, data is a Long. DataMap extends Java HashMap which implement a Map, see http://doc.java.sun.com/DocWeb/api/java.util.HashMap
segment used by Framework to keep track of where to put results Data cast into a DataMap By framework public Data DiffuseData (int segment) { DataMap<String, Object> d =new DataMap<String, Object>(); input Data = …. d.put(“name_of_inputdata", inputData); return d; } public Data Compute (Data data) { DataMap<String, Object> input = (DataMap<String,Object>)data; //data produced by DiffuseData() DataMap<String, Object> output = new DataMap<String, Object>(); //output returned to gatherdata inputData = input.get(“name_of_inputdata”); … // computation output.put("name_of _results", results); // to return to GatherData() return output; } public void GatherData (int segment, Data dat) { DataMap<String,Object> out = (DataMap<String,Object>) dat; outdata = out.get (“name_of_results”); result … // aggregate outdata from all the worker nodes. result a private variable } GatherData gives back Data object with a segment number By framework
Other methods called by framework public void initializeModule(String[] args) { Node.getLog().setLevel(Level.WARNING); // reduce logging verbosity … // initialize private variables datacount = … ; } public int getDataCount() { //Set to number of data items to be processed. return datacount; }
Framework methods used in Bootstrapping class Seeds methods • start //starts framework, deploy nodes on list of servers • startPattern // starts seeds pattern • waitOnPattern // waits for pattern to complete • stop //stops framework
User methods used in Bootstrap class Additional methods can be specified by programmer in the Workpool class and can be invoked in the Bootstrap class. Typically a method is invoked that produces the final result. Example public double getPi() { // returns value of pi based all workers double pi = (total / (random_samples * DoubleDataSize)) * 4; return pi; }
Question Will a class field modified in the DiffuseData or GatherData methods be updated with the same values as in the Compute method? Answer NO. The two methods are running on different JVMs (and different nodes)
Monte Carlo Methods A so-called “embarrassingly parallel” computation as it decomposes into obviously independent tasks that can be done in parallel without any into task communications during the computation. Monte Carlo methods use random selections. For parallelizing Monte Carlo code, must address best way to generate random numbers in parallel. 3.15
Circle formed within a 2 x 2 square. Ratio of area of circle to square given by: Points within square chosen randomly. Score kept of how many points happen to lie within circle. Fraction of points within circle will be , given sufficient number of randomly selected samples. 3.16
One quadrant can be described by integral: Random pairs of numbers, (xr,yr) generated, each between 0 and 1. Counted as in circle if 3.18
Alternative (better) Monte Carlo Method Generate random values of x to compute f(x) Sum values of f(x): where xrare randomly generated values of x between x1 and x2. Monte Carlo method very useful if the function cannot be integrated numerically (maybe having a large number of variables) 3.19
Seeds Monte Carlo codeMonteCarloPiModule.java DiffuseData Method (Required to be implemented) public Data DiffuseData (int segment) { DataMap<String, Object> d =new DataMap<String, Object>(); d.put("seed", R.nextLong()); return d; // returns a random seed for each job unit }
Compute Method (Required to be implemented) public Data Compute (Data data) { DataMap<String, Object> input = (DataMap<String,Object>)data; DataMap<String, Object> output = new DataMap<String, Object>(); Long seed = (Long) input.get("seed"); // get random seed Random r = new Random(); r.setSeed(seed); Long inside = 0L; for (inti = 0; i < DoubleDataSize ; i++) { double x = r.nextDouble(); double y = r.nextDouble(); double dist = x * x + y * y; if (dist <= 1.0) { ++inside; } } output.put("inside", inside); // to return to GatherData() return output; }
GatherData Method (Required to be implemented) public void GatherData (int segment, Data dat) { DataMap<String,Object> out = (DataMap<String,Object>) dat; Long inside = (Long) out.get("inside"); total += inside; // aggregate answer from all the worker nodes. }
getDataCount Method(Required to be implemented) public intgetDataCount() { return random_samples; }
Method to compute p result(used in bootstrap module) public double getPi() { // returns value of pi based on all workers double pi = (total / (random_samples * DoubleDataSize)) * 4; return pi; }
Bootstrap classRunMonteCarloPiModule.java package edu.uncc.grid.example.workpool; import java.io.IOException; import net.jxta.pipe.PipeID; import edu.uncc.grid.pgaf.Anchor; import edu.uncc.grid.pgaf.Operand; import edu.uncc.grid.pgaf.Seeds; import edu.uncc.grid.pgaf.p2p.Types; public class RunMonteCarloPiModule { public static void main(String[] args) { try { MonteCarloPiModule pi = new MonteCarloPiModule(); Seeds.start( args[0] , false); PipeID id = Seeds.startPattern( new Operand( (String[])null, new Anchor( args[1] , Types.DataFlowRoll.SINK_SOURCE), pi ) ); System.out.println(id.toString() ); Seeds.waitOnPattern(id); System.out.println( "The result is: " + pi.getPi() ) ; Seeds.stop(); } catch (SecurityException e) { … } } Deploys framework and runs code
Another Seeds Workpool example Matrix Addition C = A + B Matrices (2-D arrays) A, B, and C. Given elements of A as ai,jand elements of B as bi,j, each element of C is computed as: (Assignment Task 4 asks you to do matrix multiplication)
MatrixAddModule.java Continues on several sides package edu.uncc.grid.example.workpool; import … public class MatrixAddModule extends Workpool { private static final long serialVersionUID = 1L; int[][] matrixA; int[][] matrixB; int[][] matrixC; public MatrixAddModule() { matrixC = new int[3][3]; } public void initMatrices(){ matrixA = new int[][]{{2,5,8},{3,4,9},{1,5,2}}; matrixB = new int[][]{{2,5,8},{3,4,9},{1,5,2}}; } public void initializeModule(String[] args) { Node.getLog().setLevel(Level.WARNING); }
public Data DiffuseData(int segment) { int[] rowA = new int[3]; int[] rowB = new int[3]; DataMap<String, int[]> d =new DataMap<String, int[]>(); int k = segment; for (inti=0;i<3;i++) { //Copy one row of A and one row of B into d rowA[i] = matrixA[k][i]; rowB[i] = matrixA[k][i]; } d.put("rowA",rowA); d.put("rowB",rowB); return d; } Note use of segment variable
public Data Compute(Data data) { int[] rowC = new int[3]; DataMap<String, int[]> input = (DataMap<String,int[]>)data; DataMap<String, int[]> output = new DataMap<String, int[]>(); int[] rowA = (int[]) input.get("rowA"); int[] rowB = (int[]) input.get("rowB"); for (inti=0;i<3;i++){ //computation rowC[i] = rowA[i] + rowB[i]; } output.put("rowC",rowC); return output; } This is the part that will need altering in the assignment to achieve multiplication
Note use of segment variable public void GatherData(int segment, Data dat) { DataMap<String,int[]> out = (DataMap<String,int[]>) dat; int[] rowC = (int[]) out.get("rowC"); for (inti=0;i<3;i++) { matrixC[segment][i]= rowC[i]; } } public void printResult(){ for (inti=0;i<3;i++){ System.out.println(""); for(int j=0;j<3;j++){ System.out.print(matrixC[i][j] + ","); } } } public intgetDataCount() { return 3; } }
Bootstrap class - RunMatrixAddModule.java package edu.uncc.grid.example.workpool; import … public class RunMatrixAddModule { public static String localhost = "T5400"; public static String seedslocation = "C:\\seeds_2.0\\pgaf"; public static void main (String [] args ) { try{ Seeds.start( seedslocation ,false); MatrixAddModule m = new MatrixAddModule(); m.initMatrices(); PipeID id = Seeds.startPattern(new Operand ((String[])null,new Anchor (localhost, Types.DataFlowRoll.SINK_SOURCE),m)); Seeds.waitOnPattern(id); m.printResult(); Seeds.stop(); … In this example, host and path to Seeds are hardcoded.
Measuring Time(Task 4 in Assignment 1) Can instrument code in the bootstrap class: public class RunMyModule { public static void main (String [] args ) { try{ long start = System.currentTimeMillis(); MyModule m = new MyModule(); Seeds.start(. ); PipeID id = ( … ); Seeds.waitOnPattern(id); Seeds.stop(); long stop = System.currentTimeMillis(); double time = (double) (stop - start) / 1000.0; System.out.println(“Execution time = " + time); } catch (SecurityException e) { … …
New version of SeedsJust done by Jeremy public class RunMonteCarloPiModule { public static void main(String[] args) { try { MonteCarloPiModule pi=new MonteCarloPiModule(); Thread id = Seeds.startPatternMulticore( new Operand( (String[])null , new Anchor( args[0], Types.DataFlowRole.SINK_SOURCE) , pi ), 4 ); id.join(); System.out.println( "The result is: " + pi.getPi() ) ; } catch (SecurityException e) { … } } } • Multicore version • Apparently much faster on a multicore platform • Does not use JXTA P2P network to run cluster nodes, thread based. • Bootstrap class does not need to start and stop JXTA P2P. Seeds.start() and Seeds.stop() not needed. Otherwise user code similar. I have not yet tested it, but hopefully will be able to get to class soon.