460 likes | 471 Views
Learn about Grid Superscalar, a tool that simplifies the programming of GRID applications and makes them as easy to write as sequential ones. This article discusses its objective, behavior, and how it works using a specific example.
E N D
Programming Grid Applications with GRID Superscalar[ Journal of Grid Computing, Volume 1, Issue 2, 2003. ]Presenter : Juan Carlos MartinezAgnostic : Allen LeeAuthors : Rosa M. Badia, Jesús Labarta, Raül Sirvent, Josep M. Pérez, José M. Cela and Rogeli Grima
Overview What’s Grid Superscalar and what is its behavior?
grid Overview Globus Toolkit Grid Superscalar • It promotes the ease of programming GRID applications • Basic idea: ns seconds/minutes/hours http://www.bsc.es/grid/grid_superscalar/documents/FIU_seminar.pdf
Overview Grid Superscalar Objective • Development complexity of Grid applications to the minimum • writing a Computational Grid app as easy as writing a sequential one • Target applications: composed of tasks • Granularity of the tasks of the level of simulations or programs • Data objects are files
Overview Let’s see how it works… for (int i = 0; i < MAXITER; i++) { newBWd = GenerateRandom(); subst (referenceCFG, newBWd, newCFG); dimemas (newCFG, traceFile, DimemasOUT); post (newBWd, DimemasOUT, FinalOUT); if(i % 3 == 0) Display(FinalOUT); } fd = GS_Open(FinalOUT, R); printf("Results file:\n"); present (fd); GS_Close(fd); http://www.bsc.es/grid/grid_superscalar/documents/FIU_seminar.pdf Input/output files
How It Works For this let’s see a specific example. Let’s use the java program named Matmul that basically multiply two matrices: Matmul A sequential code in Java that creates 2 hyper matrices (4 matrices inside of each one) and what it does is multiply 4 of them against the other 4 all at runtime. Now, with Grid Superscalar, we made this code parallelized.
Getting Started!!!File Structure C applications: 1. <myapplication>.idl 2. <myapplication>.c (main program) 3. <myapplication>-functions.c (functions to be executed on the grid) Java applications: 1. <myapplication>.idl 2. <arbitraryname>.java (main program) diff from the actual prog 3. <myapplication>Impl.java (functions-methods to be executed on the grid) http://www.bsc.es/grid/grid_superscalar/documents/ssh_gridsuperscalar_quick_tutorial.pdf
How It Works For this case of Matmul we will basically have two folders and two xml files. matmul_java_master App.java Matmul.idl project.gsdeploy@ Matmul_java_worker Matmul.idl MatmulImpl.java Block.java MatmulAppException.java Matmul_java (the xml file itself) Project.gsdeploy@ Matmul_java
The IDL file Matmul.idl interface CHOLESKY { void multiply_accumulative ( inout File f3, in File f1, in File f2 ); };
<?xml version="1.0" encoding="UTF-8"?><project isSimple="yes" masterBandwidth="100000" masterBuildScript="" masterInstallDir="/home/lion-e/globus2/matmul_java_master" masterName="la-blade-01.cs.fiu.edu" masterSourceDir="/a/lion.cs.fiu.edu./disk/216/e/globus2/matmul_java_master" name="Matmul"workerBuildScript="" workerSourceDir="/a/lion.cs.fiu.edu./disk/216/e/globus2/matmul_java_worker"><disks><disk name="_MasterDisk_"/><disk name="_WorkingDisk_la-blade-02_cs_fiu_edu_"/><disk name="_WorkingDisk_la-blade-03_cs_fiu_edu_"/></disks><directories><directory disk="_MasterDisk_" isWorkingPath="yes" path="/home/lion-e/globus2/matmul_java_master"/></directories><workers><worker Arch="" GFlops="1.0" LimitOfJobs="1" Mem="16" NCPUs="1" NetKbps="100000" OpSys="" Queue="none" Quota="0" deploymentStatus="deployed" installDir="/home/lion-e/globus2/matmul_java_worker" name="la-blade-02.cs.fiu.edu"><directories><directory disk="_WorkingDisk_la-blade-02_cs_fiu_edu_" isWorkingPath="yes" path="/home/lion-e/globus2/matmul_java_worker"/></directories></worker>…. The XML file for Matmul_java
The Deployment Center Adding Hosts
Deploying our application in the workers… We’ve got to build the master We get inside this folder and execute.. gsjavabuild master Matmul We’ve got to build the worker We get inside this folder and execute.. gsjavabuild worker Matmul After that application ready to run (deployed) Source files wanted? gsstubgen -j Matmul.idl
Files created: When deploying with gsjavabuild… • matmul_java_master • App.java • Matmul.idl • project.gsdeploy • App.class • ConstraintsWrapper.class • Matmul.class • MatmulConstraints.class • MatmulConstraintsInterface.class • MatmulOps.class Original Files
Files created: When deploying with gsjavabuild… • Matmul_java_worker • Block.java • MatmulAppException.java • Matmul.idl • MatmulImpl.java • workerGS.sh.in • Block.class • MatmulAppException.class • MatmulImpl.class • MatmulOps.class • Worker.class • workerGS.sh Original Files
App.java public class App { private final int MSIZE = 2; private final int BSIZE = 64; private String [ ][ ]_A; private String [ ][ ]_B; private String [ ][ ]_C; public void Run () { initialize_variables(); // initialize arrays holding the acctual array names try { fill_matrices(); } catch ( IOException ioe ) { ioe.printStackTrace(); return; } GSMaster.On(); for (int i = 0; i < MSIZE; i++) for (int j = 0; j < MSIZE; j++) for (int k = 0; k < MSIZE; k++) Matmul.multiply_accumulative( _C[i][j], _A[i][k], _B[k][j] ); GSMaster.Off(0); } private void initialize_variables () { … } private void fill_matrices () throws FileNotFoundException, IOException { …. } public static void main(String args[ ]) { (new App()).Run(); } }
Whats GSMaster.java? GSMaster class calls native functions in C which are implemented in the file GS.cc GSMaster.On() GS_ON() GSMaster.Off() GS_OFF()
GS_ON() checks for environment variables activates modules from globus like: globus_l_module_activate(GLOBUS_COMMON_MODULE); globus_l_module_activate(GLOBUS_XIO_MODULE); globus_l_module_activate(GLOBUS_FTP_CLIENT_MODULE); …. Creates folders for the debugging files that will be created if the GS_DEBUG*** envoronment variable was activated. This job of creating files is done with: res = globus_gram_client_job_request(….); pre_ws_gram (GT2)*** In other words leaves everything prepared in the Grid s o that when the execution comes, globus will allow it. GS_OFF() Basically does the opposite of GS_ON(), that is, free resources that were created by GS_ON() like: resul = globus_module_deactivate(GLOBUS_COMMON_MODULE); resul = globus_module_deactivate(GLOBUS_XIO_MODULE); resul = globus_module_deactivate(GLOBUS_FTP_CLIENT_MODULE); ….. And to delete files it uses: res = globus_gram_client_job_request(….); pre_ws_gram (GT2)***
Again on App.java public class App { private final int MSIZE = 2; private final int BSIZE = 64; private String [][]_A; private String [][]_B; private String [][]_C; public void Run () { initialize_variables(); // initialize arrays holding the actual array names try { fill_matrices(); } catch ( IOException ioe ) { ioe.printStackTrace(); return; } GSMaster.On(); for (int i = 0; i < MSIZE; i++) for (int j = 0; j < MSIZE; j++) for (int k = 0; k < MSIZE; k++) Matmul.multiply_accumulative( _C[i][j], _A[i][k], _B[k][j] ); GSMaster.Off(0); } private void initialize_variables () { … } private void fill_matrices () throws FileNotFoundException, IOException { …. } public static void main(String args[]) { (new App()).Run(); } }
Matmul.java /* This file has been autogenerated from 'Matmul.idl'. */ /* CHANGES TO THIS FILE WILL BE LOST */ public class Matmul implements MatmulOps { public static void multiply_accumulative(String f3, String f1, String f2) { /* Marshalling/Demarshalling buffers */ /* Parameter marshalling */ String pars[] = new String[4]; pars[0] = f3; pars[1] = f1; pars[2] = f2; pars[3] = f3; GSMaster.Execute(multiply_accumulativeOp, 3, 0, 1, 0, pars); ws_gram GT4*** } }
Execution Itself… Again GS.cc GsMaster.Execute Execute (from GS.cc) Execute SubmitShortcuts DoSubmit “Execute function : Interface GS – GLOBUS”
DoSubmit • Data dependencies (queue) • Submit to list of running tasks. • Instruction used for Task: res = globus_wsgram_job_submit(namehost[Task->Machine], rsl, &Task->input, &Task->monitor, &engine, globus_l_notify_cb); ***GT4***
MatmulOps.java /* This file has been autogenerated from 'Matmul.idl'. */ /* CHANGES TO THIS FILE WILL BE LOST */ public interface MatmulOps { int multiply_accumulativeOp = 0; }
Worker.java /* This file has been autogenerated from 'Matmul.idl'. */ /* CHANGES TO THIS FILE WILL BE LOST */ public class Worker implements MatmulOps { public static void main(String args[]) { int opCod; if (args.length < 6) { System.out.println("ERROR: Wrong arguments list passed to the worker\n"); System.exit(1); } opCod = Integer.parseInt(args[1]); GSWorker.IniWorker(args); switch (opCod) { case multiply_accumulativeOp: MatmulImpl.multiply_accumulative(args[5], args[3], args[4]); Local Call break; } GSWorker.EndWorker(args); } }
MatmulImpl.java was originally in the folder as we remember, so it’s a local call what we are doing now: If we remember: Matmul_java_worker Matmul.idl MatmulImpl.java Block.java MatmulAppException.java public class MatmulImpl { public static void multiply_accumulative( String f3, String f1, String f2 ) { Block a = new Block( f1 ); Block b = new Block( f2 ); Block c = new Block( f3 ); c.multiplyAccum( a, b ); try { c.blockToDisk( f3 ); } catch ( MatmulAppException ce ) { System.err.println( ce.getMessage() ); GSWorker.SetResult(-1); return; } } }
So basically we have… Grid Superscalar Execute (inside GS.cc) Interface Between GS and Globus Globus (GRAM running locally in the worker) Local Execution
However we have GT2 and GT4 in GS Remember… GS.GS_ON() & GS.GS_OFF() GT2 GS.Execute GT4
GRAM in GT2 & GT4 GRAM Implementations Pre-WS GRAM - GT2 First implementation of GRAM GT2 - Globus-specific protocol Gatekeeper/jobmanager services WS GRAM - GT4 Web Service based implementations of GRAM GT3 OGSI based implementation GT4 WSRF based implementation
GT2Remember the “res = globus_gram_client_job_request(….);”??? pre_ws_gram (GT2)*** GS_ON and GS_OFF http://www-cse.ucsd.edu/classes/sp00/cse225/notes/shava/globus.html
GT4 • GSMaster.Execute(multiply_accumulativeOp, 3, 0, 1, 0, pars); ws_gram GT4*** http://www-unix.globus.org/toolkit/docs/development/3.9.5/execution/key/WS_GRAM_components.png
Agnostic Questions 1.- Do you believe that the GRID Superscalar would interfere or benefit the concept of the economic model of the GRID as mentioned in a previous presentation (A Case for Economy Grid Architecture for Service Oriented Grid Computing)? One of the problems this paper presented was the cost obtained by deploying a job in a Grid and not having an exact knowledge of which hosts should be the best to execute each task. Grid Superscalar, in this sense, takes advantage of knowing the resources of each of its available workers and in this way it’s able to know if for example a worker is able to receive and process two tasks at the same time (2 processor host for example) since Grid Superscalar has a configuration file for this kind of information.
Agnostic Questions 2.- Would the addition of web services on a GRID utilizing the GRID Superscalar cause issues with the way the GRID Superscalar tries to make sequential programs parallel? First of all, GS is used as a dynamic library as it is now, and that library is responsible of the parallelization process. Now if we add Web Services into a Grid for example one in each host, then if a program requires to call two of those web services for instance Grid superscalar can make those 2 calls parallel as long as they are not dependent.
Agnostic Questions 3.- Some of the applications that the GRID Superscalar is geared towards require large data files. Do you believe that the overhead of sending the same large files around to support parallel processing could be more harmful or wasteful than operating the process sequentially? GS tries to exploit the data locality of the files. So if a large file is sent to a machine or a large file is generated as a result in a machine, GS will consider that information in order to decide where to run a job (to avoid transfers in future tasks and minimizing total execution time). Also there is a shared disk mechanism (described in the manual) where you can specify the location of replicas of your files in order to avoid GS to transfer them every time.
Agnostic Questions 4.- Could the GRID Superscalar be optimized if it was discovered that there are costs for using various resources? For example, what if it was found that the connection between two systems on the grid is slower than the connections between the other system due to weather or network congestion? By now the parameters that you can specify about the network are the theoretical bandwidth in a machine. We do not work with any dynamical information (NWS or similar).
Agnostic Questions 5.- How would the GRID Superscalar adjust if one of the computers that were assigned a task on the GRID suddenly becomes unavailable due to weather, for example? If there is a failure during the execution, current version of GS stops the master (so, the whole process). Then you can re-run the program again without the machine that causes the problem, but the previous computations that have been checkpointed won't be repeated. Currently we have a development version which detects failures in machines and removes failing machines from the computation at runtime, and thus the overall process keeps going.
Agnostic Questions 6.- Would there be a reason to use a GRID Superscalar on a GRID that has few systems, where each system has a unique resource that will likely be used by tasks given to the GRID? It depends on the form which that Grid has. Imagine that each system is from a different institution, works with a different queuing system, etc... It would be easier to gridify the application using GS than using any other parallel programming model (mpi(Message Passing Interface),etc). Also the file locality policy can reduce transfers compared to MPI, for instance (where you always have to send the data you need to compute).
Agnostic Questions 7.- The converting of the applications from sequential to parallel is done without the programmer’s knowledge. How would this affect the ability for programmers to deal with exception handling? The parallelization is basically functional parallelization. So an error inside the function can be detected the same way in the worker code. When an error is detected, you can return a value to the master meaning that things went wrong in that function.
Agnostic Questions 8.- GRIDs have a very fragmented nature where different parts of the GRID are administered by different organizations and the agreements between each organization on the usage are not necessarily the same. How could the Superscalar make sure that performance isn’t being hindered by sending tasks to a system that, by agreement, gives much less CPU utilization than another system? When you add a machine in the configuration file you can specify the computing power of that machine. Then in the estimation function you can use that value to try to predict the execution time of that operation in the given machine. As you see it is specified statically (GS does not gather any information about the real status of the different systems).
Agnostic Questions 9.- Do you feel that it would be possible to use flat files as a synchronization component to allow the GRID Superscalar to allow processes to use a database to maintain the constraints of WaW, RaW, and WaR? Grid Superscalar does need it because it can do it by itself. File dependency is always checked by the Grid Superscalar in order to know which job can be executed and which one hast to wait until the other one finishes because of data dependencies.
Agnostic Questions 10.- Does the system provide any sort of protection against renaming files? Would the Double Hashtable system be compromised if a submitted task renames files or makes duplicate files as part of its operations? You cannot rename source files in a worker (as it is specified in the manual), but you can copy them and make whatever you want with that copy. Also with temporary files (files which are just in that "local domain" of that task) you can do virtually anything (they will be removed after the computation, because a temporary directory is created in order to execute the task).
Questions? Comments?... No comment!!! :p