Grid Superscalar for Programming Grid Applications

Programming Grid Applications with GRID Superscalar[ Journal of Grid Computing, Volume 1, Issue 2, 2003. ]Presenter : Juan Carlos MartinezAgnostic : Allen LeeAuthors : Rosa M. Badia, Jesús Labarta, Raül Sirvent, Josep M. Pérez, José M. Cela and Rogeli Grima

Overview What’s Grid Superscalar and what is its behavior?

grid Overview Globus Toolkit Grid Superscalar • It promotes the ease of programming GRID applications • Basic idea:  ns  seconds/minutes/hours http://www.bsc.es/grid/grid_superscalar/documents/FIU_seminar.pdf

Overview Grid Superscalar Objective • Development complexity of Grid applications to the minimum • writing a Computational Grid app as easy as writing a sequential one • Target applications: composed of tasks • Granularity of the tasks of the level of simulations or programs • Data objects are files

Overview Let’s see how it works… for (int i = 0; i < MAXITER; i++) { newBWd = GenerateRandom(); subst (referenceCFG, newBWd, newCFG); dimemas (newCFG, traceFile, DimemasOUT); post (newBWd, DimemasOUT, FinalOUT); if(i % 3 == 0) Display(FinalOUT); } fd = GS_Open(FinalOUT, R); printf("Results file:\n"); present (fd); GS_Close(fd); http://www.bsc.es/grid/grid_superscalar/documents/FIU_seminar.pdf Input/output files

How It Works For this let’s see a specific example. Let’s use the java program named Matmul that basically multiply two matrices: Matmul A sequential code in Java that creates 2 hyper matrices (4 matrices inside of each one) and what it does is multiply 4 of them against the other 4 all at runtime. Now, with Grid Superscalar, we made this code parallelized.

Let’s understand this better…

Looking at matmul

Getting Started!!!File Structure C applications: 1. <myapplication>.idl 2. <myapplication>.c (main program) 3. <myapplication>-functions.c (functions to be executed on the grid) Java applications: 1. <myapplication>.idl 2. <arbitraryname>.java (main program) diff from the actual prog 3. <myapplication>Impl.java (functions-methods to be executed on the grid) http://www.bsc.es/grid/grid_superscalar/documents/ssh_gridsuperscalar_quick_tutorial.pdf

How It Works For this case of Matmul we will basically have two folders and two xml files. matmul_java_master App.java Matmul.idl project.gsdeploy@ Matmul_java_worker Matmul.idl MatmulImpl.java Block.java MatmulAppException.java Matmul_java (the xml file itself) Project.gsdeploy@  Matmul_java

The IDL file Matmul.idl interface CHOLESKY { void multiply_accumulative ( inout File f3, in File f1, in File f2 ); };

<?xml version="1.0" encoding="UTF-8"?><project isSimple="yes" masterBandwidth="100000" masterBuildScript="" masterInstallDir="/home/lion-e/globus2/matmul_java_master" masterName="la-blade-01.cs.fiu.edu" masterSourceDir="/a/lion.cs.fiu.edu./disk/216/e/globus2/matmul_java_master" name="Matmul"workerBuildScript="" workerSourceDir="/a/lion.cs.fiu.edu./disk/216/e/globus2/matmul_java_worker"><disks><disk name="_MasterDisk_"/><disk name="_WorkingDisk_la-blade-02_cs_fiu_edu_"/><disk name="_WorkingDisk_la-blade-03_cs_fiu_edu_"/></disks><directories><directory disk="_MasterDisk_" isWorkingPath="yes" path="/home/lion-e/globus2/matmul_java_master"/></directories><workers><worker Arch="" GFlops="1.0" LimitOfJobs="1" Mem="16" NCPUs="1" NetKbps="100000" OpSys="" Queue="none" Quota="0" deploymentStatus="deployed" installDir="/home/lion-e/globus2/matmul_java_worker" name="la-blade-02.cs.fiu.edu"><directories><directory disk="_WorkingDisk_la-blade-02_cs_fiu_edu_" isWorkingPath="yes" path="/home/lion-e/globus2/matmul_java_worker"/></directories></worker>…. The XML file for Matmul_java

The Deployment Center Adding Hosts

Selecting hosts for a specific project

Deploying our application in the workers… We’ve got to build the master We get inside this folder and execute.. gsjavabuild master Matmul We’ve got to build the worker We get inside this folder and execute.. gsjavabuild worker Matmul After that application ready to run (deployed) Source files wanted? gsstubgen -j Matmul.idl

Files created: When deploying with gsjavabuild… • matmul_java_master • App.java • Matmul.idl • project.gsdeploy • App.class • ConstraintsWrapper.class • Matmul.class • MatmulConstraints.class • MatmulConstraintsInterface.class • MatmulOps.class Original Files

Files created: When deploying with gsjavabuild… • Matmul_java_worker • Block.java • MatmulAppException.java • Matmul.idl • MatmulImpl.java • workerGS.sh.in • Block.class • MatmulAppException.class • MatmulImpl.class • MatmulOps.class • Worker.class • workerGS.sh Original Files

Interaction

App.java public class App { private final int MSIZE = 2; private final int BSIZE = 64; private String [ ][ ]_A; private String [ ][ ]_B; private String [ ][ ]_C; public void Run () { initialize_variables(); // initialize arrays holding the acctual array names try { fill_matrices(); } catch ( IOException ioe ) { ioe.printStackTrace(); return; } GSMaster.On(); for (int i = 0; i < MSIZE; i++) for (int j = 0; j < MSIZE; j++) for (int k = 0; k < MSIZE; k++) Matmul.multiply_accumulative( _C[i][j], _A[i][k], _B[k][j] ); GSMaster.Off(0); } private void initialize_variables () { … } private void fill_matrices () throws FileNotFoundException, IOException { …. } public static void main(String args[ ]) { (new App()).Run(); } }

Whats GSMaster.java? GSMaster class calls native functions in C which are implemented in the file GS.cc GSMaster.On()  GS_ON() GSMaster.Off()  GS_OFF()

GS_ON() checks for environment variables activates modules from globus like: globus_l_module_activate(GLOBUS_COMMON_MODULE); globus_l_module_activate(GLOBUS_XIO_MODULE); globus_l_module_activate(GLOBUS_FTP_CLIENT_MODULE); …. Creates folders for the debugging files that will be created if the GS_DEBUG*** envoronment variable was activated. This job of creating files is done with: res = globus_gram_client_job_request(….); pre_ws_gram (GT2)*** In other words leaves everything prepared in the Grid s o that when the execution comes, globus will allow it. GS_OFF() Basically does the opposite of GS_ON(), that is, free resources that were created by GS_ON() like: resul = globus_module_deactivate(GLOBUS_COMMON_MODULE); resul = globus_module_deactivate(GLOBUS_XIO_MODULE); resul = globus_module_deactivate(GLOBUS_FTP_CLIENT_MODULE); ….. And to delete files it uses: res = globus_gram_client_job_request(….); pre_ws_gram (GT2)***

Again on App.java public class App { private final int MSIZE = 2; private final int BSIZE = 64; private String [][]_A; private String [][]_B; private String [][]_C; public void Run () { initialize_variables(); // initialize arrays holding the actual array names try { fill_matrices(); } catch ( IOException ioe ) { ioe.printStackTrace(); return; } GSMaster.On(); for (int i = 0; i < MSIZE; i++) for (int j = 0; j < MSIZE; j++) for (int k = 0; k < MSIZE; k++) Matmul.multiply_accumulative( _C[i][j], _A[i][k], _B[k][j] ); GSMaster.Off(0); } private void initialize_variables () { … } private void fill_matrices () throws FileNotFoundException, IOException { …. } public static void main(String args[]) { (new App()).Run(); } }

Matmul.java /* This file has been autogenerated from 'Matmul.idl'. */ /* CHANGES TO THIS FILE WILL BE LOST */ public class Matmul implements MatmulOps { public static void multiply_accumulative(String f3, String f1, String f2) { /* Marshalling/Demarshalling buffers */ /* Parameter marshalling */ String pars[] = new String[4]; pars[0] = f3; pars[1] = f1; pars[2] = f2; pars[3] = f3; GSMaster.Execute(multiply_accumulativeOp, 3, 0, 1, 0, pars); ws_gram GT4*** } }

Execution Itself… Again GS.cc GsMaster.Execute  Execute (from GS.cc) Execute  SubmitShortcuts  DoSubmit “Execute function : Interface GS – GLOBUS”

DoSubmit • Data dependencies (queue) • Submit to list of running tasks. • Instruction used for Task: res = globus_wsgram_job_submit(namehost[Task->Machine], rsl, &Task->input, &Task->monitor, &engine, globus_l_notify_cb); ***GT4***

Interaction

MatmulOps.java /* This file has been autogenerated from 'Matmul.idl'. */ /* CHANGES TO THIS FILE WILL BE LOST */ public interface MatmulOps { int multiply_accumulativeOp = 0; }

Interaction

Worker.java /* This file has been autogenerated from 'Matmul.idl'. */ /* CHANGES TO THIS FILE WILL BE LOST */ public class Worker implements MatmulOps { public static void main(String args[]) { int opCod; if (args.length < 6) { System.out.println("ERROR: Wrong arguments list passed to the worker\n"); System.exit(1); } opCod = Integer.parseInt(args[1]); GSWorker.IniWorker(args); switch (opCod) { case multiply_accumulativeOp: MatmulImpl.multiply_accumulative(args[5], args[3], args[4]);  Local Call break; } GSWorker.EndWorker(args); } }

MatmulImpl.java was originally in the folder as we remember, so it’s a local call what we are doing now: If we remember: Matmul_java_worker Matmul.idl MatmulImpl.java Block.java MatmulAppException.java public class MatmulImpl { public static void multiply_accumulative( String f3, String f1, String f2 ) { Block a = new Block( f1 ); Block b = new Block( f2 ); Block c = new Block( f3 ); c.multiplyAccum( a, b ); try { c.blockToDisk( f3 ); } catch ( MatmulAppException ce ) { System.err.println( ce.getMessage() ); GSWorker.SetResult(-1); return; } } }

So basically we have… Grid Superscalar Execute (inside GS.cc)  Interface Between GS and Globus Globus (GRAM running locally in the worker) Local Execution

However we have GT2 and GT4 in GS Remember… GS.GS_ON() & GS.GS_OFF() GT2 GS.Execute GT4

GRAM in GT2 & GT4 GRAM Implementations Pre-WS GRAM - GT2  First implementation of GRAM  GT2 - Globus-specific protocol  Gatekeeper/jobmanager services WS GRAM - GT4  Web Service based implementations of GRAM  GT3 OGSI based implementation  GT4 WSRF based implementation

GT2Remember the “res = globus_gram_client_job_request(….);”??? pre_ws_gram (GT2)*** GS_ON and GS_OFF http://www-cse.ucsd.edu/classes/sp00/cse225/notes/shava/globus.html

GT4 • GSMaster.Execute(multiply_accumulativeOp, 3, 0, 1, 0, pars); ws_gram GT4*** http://www-unix.globus.org/toolkit/docs/development/3.9.5/execution/key/WS_GRAM_components.png

Agnostic Questions 1.- Do you believe that the GRID Superscalar would interfere or benefit the concept of the economic model of the GRID as mentioned in a previous presentation (A Case for Economy Grid Architecture for Service Oriented Grid Computing)? One of the problems this paper presented was the cost obtained by deploying a job in a Grid and not having an exact knowledge of which hosts should be the best to execute each task. Grid Superscalar, in this sense, takes advantage of knowing the resources of each of its available workers and in this way it’s able to know if for example a worker is able to receive and process two tasks at the same time (2 processor host for example) since Grid Superscalar has a configuration file for this kind of information.

Agnostic Questions 2.- Would the addition of web services on a GRID utilizing the GRID Superscalar cause issues with the way the GRID Superscalar tries to make sequential programs parallel? First of all, GS is used as a dynamic library as it is now, and that library is responsible of the parallelization process. Now if we add Web Services into a Grid for example one in each host, then if a program requires to call two of those web services for instance Grid superscalar can make those 2 calls parallel as long as they are not dependent.

Agnostic Questions 3.- Some of the applications that the GRID Superscalar is geared towards require large data files. Do you believe that the overhead of sending the same large files around to support parallel processing could be more harmful or wasteful than operating the process sequentially? GS tries to exploit the data locality of the files. So if a large file is sent to a machine or a large file is generated as a result in a machine, GS will consider that information in order to decide where to run a job (to avoid transfers in future tasks and minimizing total execution time). Also there is a shared disk mechanism (described in the manual) where you can specify the location of replicas of your files in order to avoid GS to transfer them every time.

Agnostic Questions 4.- Could the GRID Superscalar be optimized if it was discovered that there are costs for using various resources? For example, what if it was found that the connection between two systems on the grid is slower than the connections between the other system due to weather or network congestion? By now the parameters that you can specify about the network are the theoretical bandwidth in a machine. We do not work with any dynamical information (NWS or similar).

Agnostic Questions 5.- How would the GRID Superscalar adjust if one of the computers that were assigned a task on the GRID suddenly becomes unavailable due to weather, for example? If there is a failure during the execution, current version of GS stops the master (so, the whole process). Then you can re-run the program again without the machine that causes the problem, but the previous computations that have been checkpointed won't be repeated. Currently we have a development version which detects failures in machines and removes failing machines from the computation at runtime, and thus the overall process keeps going.

Agnostic Questions 6.- Would there be a reason to use a GRID Superscalar on a GRID that has few systems, where each system has a unique resource that will likely be used by tasks given to the GRID? It depends on the form which that Grid has. Imagine that each system is from a different institution, works with a different queuing system, etc... It would be easier to gridify the application using GS than using any other parallel programming model (mpi(Message Passing Interface),etc). Also the file locality policy can reduce transfers compared to MPI, for instance (where you always have to send the data you need to compute).

Agnostic Questions 7.- The converting of the applications from sequential to parallel is done without the programmer’s knowledge. How would this affect the ability for programmers to deal with exception handling? The parallelization is basically functional parallelization. So an error inside the function can be detected the same way in the worker code. When an error is detected, you can return a value to the master meaning that things went wrong in that function.

Agnostic Questions 8.- GRIDs have a very fragmented nature where different parts of the GRID are administered by different organizations and the agreements between each organization on the usage are not necessarily the same. How could the Superscalar make sure that performance isn’t being hindered by sending tasks to a system that, by agreement, gives much less CPU utilization than another system? When you add a machine in the configuration file you can specify the computing power of that machine. Then in the estimation function you can use that value to try to predict the execution time of that operation in the given machine. As you see it is specified statically (GS does not gather any information about the real status of the different systems).

Agnostic Questions 9.- Do you feel that it would be possible to use flat files as a synchronization component to allow the GRID Superscalar to allow processes to use a database to maintain the constraints of WaW, RaW, and WaR? Grid Superscalar does need it because it can do it by itself. File dependency is always checked by the Grid Superscalar in order to know which job can be executed and which one hast to wait until the other one finishes because of data dependencies.

Agnostic Questions 10.- Does the system provide any sort of protection against renaming files? Would the Double Hashtable system be compromised if a submitted task renames files or makes duplicate files as part of its operations? You cannot rename source files in a worker (as it is specified in the manual), but you can copy them and make whatever you want with that copy. Also with temporary files (files which are just in that "local domain" of that task) you can do virtually anything (they will be removed after the computation, because a temporary directory is created in order to execute the task).

Questions? Comments?... No comment!!! :p

Grid Superscalar for Programming Grid Applications

Grid Superscalar for Programming Grid Applications

Presentation Transcript

Overview

Overview

OVERVIEW

Overview

Overview

Overview

Overview

Overview

overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview