310 likes | 484 Views
GRID superscalar: a programming paradigm for GRID applications. CEPBA-IBM Research Institute Raül Sirvent , Josep M. Pérez, Rosa M. Badia, Jesús Labarta. Outline. Objective The essence User’s interface Automatic code generation Run-time features Programming experiences Ongoing work
E N D
GRID superscalar: a programming paradigm for GRID applications CEPBA-IBM Research Institute Raül Sirvent, Josep M. Pérez, Rosa M. Badia, Jesús Labarta
Outline • Objective • The essence • User’s interface • Automatic code generation • Run-time features • Programming experiences • Ongoing work • Conclusions
Grid Objective • Ease the programming of GRID applications • Basic idea: ns seconds/minutes/hours
Outline • Objective • The essence • User’s interface • Automatic code generation • Current run-time features • Programming experiences • Future work • Conclusions
The essence • Assembly language for the GRID • Simple sequential programming, well defined operations and operands • C/C++, Perl, … • Automatic run time “parallelization” • Use architectural concepts from microprocessor design • Instruction window (DAG), dependence analysis, scheduling, locality, renaming, forwarding, prediction, speculation,…
The essence for (int i = 0; i < MAXITER; i++) { newBWd = GenerateRandom(); subst (referenceCFG, newBWd, newCFG); dimemas (newCFG, traceFile, DimemasOUT); post (newBWd, DimemasOUT, FinalOUT); if(i % 3 == 0) Display(FinalOUT); } fd = GS_Open(FinalOUT, R); printf("Results file:\n"); present (fd); GS_Close(fd);
Subst Subst Subst DIMEMAS Subst DIMEMAS Subst Subst DIMEMAS Post DIMEMAS Post … DIMEMAS DIMEMAS Post Post Post Post Display CIRI Grid Display Subst DIMEMAS GS_open Post The essence
CIRI Grid The essence Subst Subst Subst DIMEMAS Subst DIMEMAS Subst Subst DIMEMAS Subst Post DIMEMAS Post … DIMEMAS DIMEMAS Post DIMEMAS Post Post Post Post Display Display GS_open
Outline • Objective • The essence • User’s interface • Automatic code generation • Run-time features • Programming experiences • Ongoing work • Conclusions
User’s interface • Three components: • Main program • Subroutines/functions • Interface Definition Language (IDL) file • Programming languages: C/C++, Perl
User’s interface • A Typical sequential program • Main program: for (int i = 0; i < MAXITER; i++) { newBWd = GenerateRandom(); subst (referenceCFG, newBWd, newCFG); dimemas (newCFG, traceFile, DimemasOUT); post (newBWd, DimemasOUT, FinalOUT); if(i % 3 == 0) Display(FinalOUT); } fd = GS_Open(FinalOUT, R); printf("Results file:\n"); present (fd); GS_Close(fd);
User’s interface • A Typical sequential program • Subroutines/functions void dimemas(in File newCFG, in File traceFile, out File DimemasOUT) { char command[200]; putenv("DIMEMAS_HOME=/usr/local/cepba-tools"); sprintf(command, "/usr/local/cepba-tools/bin/Dimemas -o %s %s", DimemasOUT, newCFG ); GS_System(command); } void display(in File toplot) { char command[500]; sprintf(command, "./display.sh %s", toplot); GS_System(command); }
User’s interface • GRID superscalar programming requirements • Main program: open/close files with • GS_FOpen, GS_Open, GS_FClose, GS_Close • Subroutines/functions • Temporal files on local directory or ensure uniqueness of name per subroutine invocation • GS_System instead of system • All input/output files required must be passed as arguments
User’s interface • Gridifying the sequential program • CORBA-IDL Like Interface: • In/Out/InOut files • Scalar values (in or out) • The subroutines/functions listed in this file will be executed in a remote server in the Grid. interface MC { void subst(in File referenceCFG, in double newBW, out File newCFG); void dimemas(in File newCFG, in File traceFile, out File DimemasOUT); void post(in File newCFG, in File DimemasOUT, inout File FinalOUT); void display(in File toplot) };
Outline • Objective • The essence • User’s interface • Automatic code generation • Run-time features • Programming experiences • Ongoing work • Conclusions
client server Automatic code generation: C app.idl gsstubgen app-stubs.c app.h app.c app-worker.c app-functions.c
Outline • Objective • The essence • User interface • Automatic code generation • Run-time features • Programming experiences • Ongoing work • Conclusions
Run-time features • Data dependence analysis • Detects RaW, WaR, WaW dependencies based on file parameters • Tasks’ Directed Acyclic Graph is built based on these dependencies • File renaming • WaW and WaR dependencies are avoidable with renaming • Shared disks management • Supports shared working directories: NFS • Allows shared input directories: mirrors of large DBs
Run-time features • Resource brokering and task scheduling • Scheduling policy exploits file locality • File transfer time vs execution time tradeoff considered • Tasks submitted for execution as soon as the data dependencies are solved if resources are available • End of tasks is detected by means of asynchronous callbacks • Calls to globus: • globus_gram_client_job_request • globus_gram_client_job_status • globus_gram_client_job_cancel • globus_gram_client_callback_allow • globus_poll_blocking
Run-time features • Communication between workers and master • Socket and file mechanisms provided • Checkpointing at task level • Inter-task checkpointing • Transparent to application developer • All based in Globus Toolkit C APIs (version 2.x) • Provides authentication and authorization • File transfers through gsiftp service • Task handling with gram service
Outline • Objective • The essence • User’s interface • Automatic code generation • Run-time features • Programming experiences • Ongoing work • Conclusions
Programming experiences • Parameter studies (Dimemas, Paramedir) • Algorithm flexibility • NAS Grid Benchmarks • Improved component programs flexibility • Reduced Grid level source code lines • Bioinformatics application (production) • Improved portability (Globus vs just LoadLeveler) • Reduced Grid level source code lines • Pblade solution for bioinformatics
Outline • Objective • The essence • User’s interface • Automatic code generation • Run-time features • Programming experiences • Ongoing work • Conclusions
Ongoing work • Automatic deployment
Ongoing work • fastDNAml • Computes the likelihood of various phylogenetic trees, starting with aligned DNA sequences from a number of species (Indiana University code) • Sequential and MPI (grid-enabled) versions available • Porting to GRID superscalar • Lower pressure on communications than MPI • Simpler code than MPI
Ongoing work • Run-time: exception handling try{ for (int n=0; n<=10; n++){ if (n>9) throw "Out of range"; myarray[n]='z'; } } catch (char * str){ cout << "Exception: " << str << endl; } • Interesting case: throw in workers, catch in main program
Ongoing work • OGSA oriented resource broker, based on Globus Toolkit 3.x. • And more future work: • Bindings to other basic middlewares • GAT, Ninf-G2 • New language bindings (shell script) • Enhancements in the run-time performance guided by the performance analysis
Conclusions • Presentation of the ideas of GRID superscalar • Exists a viable way to ease the programming of Grid applications • GRID superscalar run-time enables • Use of the resources in the Grid • Exploiting the existent parallelism
How GAT can help us • Middleware in a higher level (skip Globus details) • Avoid changing when Globus changes • Abstraction for using other Grid Middlewares • Resource Broker • Intra-Task checkpointing mechanism • Interesting GATObjects: • GATFile (GATFile_Copy, GATFile_Delete) • GATResourceDescription, GATResourceBroker, GATJob
More information • GRID superscalar home page: http://people.ac.upc.es/rosab/index_gs.htm • Rosa M. Badia, Jesús Labarta, Raül Sirvent, Josep M. Pérez, José M. Cela, Rogeli Grima, “Programming Grid Applications with GRID Superscalar”, Journal of Grid Computing, Volume 1 (Number 2): 151-170 (2003).