1 / 28

int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid. Elisa Heymann Department of Computer Architecture and Operating Systems. Outline. Introduction CrossBroker Parallel Job Support Interactive Job Support Conclusions. Introduction.

bradberrym
Download Presentation

int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. int.eu.grid:Experiences with Condor to Run Interactive and Parallel Applications on the Grid Elisa Heymann Department of Computer Architecture and Operating Systems

  2. Outline • Introduction • CrossBroker • Parallel Job Support • Interactive Job Support • Conclusions Condor Week 2008, May 2008

  3. Introduction • int.eu.grid Environment: • gLite (EGEE Grid Middleware) • Extensions • CrossBroker • Migrating Desktop • Jobs not handled by gLite: • parallel jobs (MPI) • Run in more than one resource • Interactive jobs • The user interacts with the application during its execution Condor Week 2008, May 2008

  4. Job F1 F2 O1 O2 SERVICES Middleware Middleware Middleware Batch execution on Grids Internet REMOTE SITE REMOTE SITE Condor Week 2008, May 2008

  5. Job Job F1 F1 F2 F2 I/O forwarding SERVICES Middleware Middleware Middleware Parallel & Interactive Job Execution • Use of resources from different sites • Resource-sets search • Co-allocation & synchronization • Fast start-up • Execution in high-occupancy situations Internet REMOTE SITE REMOTE SITE MPI Condor Week 2008, May 2008

  6. EGEE/Globus EGEE/Globus CE CE WN WN WN WN Architecture CrossBroker Information Index Migrating Desktop Scheduling Agent Resource Searcher Replica Manager Application Launcher Condor-G DAGMan Condor Week 2008, May 2008

  7. Architecture - CrossBroker • Scheduling Agent • Receives each job and keeps it in a persistent queue • Contacts Resource Searcher and gets a list of available resources • Selects resources and passes them to the Application Launcher • Resource Searcher • Given a job description (JobAd), performs the matchmaking between job needs and available resources. • Uses the Condor ClassAd library, originally designed for matches of a single job with a single resource. • A set matching has been developed to support matches of a single job to a group of resources. • Application Launcher • Responsible for providing a reliable submission service of parallel applications on the Grid. • Responsible for file staging at the remote site (executable and input/output files) • Uses the services of Condor-G Condor Week 2008, May 2008

  8. Parallel Job Support • Support for parallel jobs: • Open MPI • PACX-MPI • MPICH-P4 • MPICH-G2 • Takes into account sites capabilites • Ability to define starter scripts/process to start the parallel job • mpi-start is configured automatically and used by default. Condor Week 2008, May 2008

  9. Parallel Job Support • Job Description Language file: • JOBTYPE: • Normal: sequential jobs, just one CPU • Parallel: more than one CPU • SUBJOBTYPE: • openmpi • pacx-mpi • mpich • mpich-g2 • plain • JOBSTARTER (if not defined, mpi-start) • JOBSTARTERARGUMENTS Condor Week 2008, May 2008

  10. Parallel Job Support Type = "Job"; VirtualOrganisation = "imain"; JobType = "Parallel"; SubJobType = "pacx-mpi"; NodeNumber = 5; Executable = "test-app"; Arguments = "-v"; InputSandbox = {"test-app", "inputfile"}; OutputSanbox = {"std.out", "std.err"}; StdErr = "std.err“; StdOutput = "std.out"; Rank = other.GlueHostBenchmarkSI00 ; Requirements = other.GlueCEStateStatus == "Production"; Condor Week 2008, May 2008

  11. MPI Across Sites • CrossBroker search and selects sets of resources for the jobs • There is no guarantee that all tasks of the same job will start at the same time • 1st choice: select only sites with free resources. The job will run immediately. Unfortunately, free resources are not always available • 2nd choice: allocate a resource temporally and wait until all other tasks show up. Timeshare the resource with a backfilling policy to avoid resource idleness Condor Week 2008, May 2008

  12. CE2=aocegrid.uab.es FreeCPUs = 10 Disk =100 AverageSI = 4000 CE1=zeus.cyf-kr.edu.pl FreeCPUs = 2 Disk =100 AverageSI = 2000 CE CE CE3=bee001.ific.uv.es FreeCPUs = 3 Disk =100 AverageSI = 1000 RS CE CE5=lngrid02.lip.pt FreeCPUs = 2 Disk =100 AverageSI = 1000 CE CE4= xgrid.icm.edu.pl FreeCPUs = 6 Disk =100 AverageSI = 1000 CE [Groups with 1 CEs] [Rank=2000] aocegrid.uab.es:2119/jobmanager-pbs-workq freeCPUs = 10 MPI enabled CE [Rank=1500] zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 Non-MPI enabled CE Rank=1000] lngrid02.lip.pt/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 MPI Across Sites [Groups with 1 CEs] [Rank=2000] aocegrid.uab.es:2119/jobmanager-pbs-workq freeCPUs = 10 [Groups with 2 CEs] [Rank=1500] zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 [Rank=1000] bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 lngrid02.lip.pt:2129/jobmanager-pbs-workq freeCPUs = 2 Condor Week 2008, May 2008

  13. Time Sharing Grid Resource CrossBroker LRMS MPI JOB Scheduling Agent Condor-G Condor Week 2008, May 2008

  14. Time Sharing Grid Resource CrossBroker LRMS MPI JOB Scheduling Agent Application Launcher Condor-G Condor Week 2008, May 2008

  15. Time Sharing Grid Resource CrossBroker LRMS MPI JOB Scheduling Agent Condor GlideIn Application Launcher VM1 VM2 Condor-G Condor Week 2008, May 2008

  16. Time Sharing Grid Resource CrossBroker LRMS MPI JOB Scheduling Agent Condor GlideIn Application Launcher VM1 VM2 Condor-G Condor Week 2008, May 2008

  17. Time Sharing Grid Resource CrossBroker LRMS Scheduling Agent Condor GlideIn Application Launcher VM1 VM2 Condor-G MPI TASK Wait for the rest of MPI tasks Condor Week 2008, May 2008

  18. Time Sharing Grid Resource CrossBroker JOB LRMS Scheduling Agent Condor GlideIn Application Launcher VM1 VM2 Condor-G MPI TASK Condor Week 2008, May 2008

  19. Time Sharing Grid Resource CrossBroker LRMS Scheduling Agent Condor GlideIn Application Launcher VM1 VM2 Condor-G JOB MPI TASK BackFilling while the MPI waits Condor Week 2008, May 2008

  20. Time Sharing Grid Resource CrossBroker LRMS Scheduling Agent Condor GlideIn Application Launcher VM1 VM2 Condor-G MPI TASK JOB All tasks Ready! Condor Week 2008, May 2008

  21. Interactive Job Support • Scheduling priority • Interactive jobs are sent to sites with available machines • If there are not available machines, use time sharing • Support for interactivity in all kinds of jobs • sequential and all the MPI flavors • CrossBroker injects interactive agents that enable communication between user and job • Transparent to the user • Full integration with glogin & gVid • Condor Bypass supported Condor Week 2008, May 2008

  22. Interactive Job Support • Job Description Language file: • INTERACTIVE: true/false. Indicates that the job is interactive and the broker should treat it with higher proirity • INTERACTIVEAGENT • INTERACTIVEAGENTARGUMENTS • These attributes specify the command (and its arguments) used to communicate with the user. Condor Week 2008, May 2008

  23. Interactive Job Support Type = "Job"; VirtualOrganisation = "imain"; JobType = "Parallel"; SubJobType = “openmpi"; NodeNumber = 11; Interactive = TRUE; InteractiveAgent = “glogin“; InteractiveAgentArguments = “-r –p 195.168.105.65:23433“; Executable = "test-app"; InputSandbox = {"test-app", "inputfile"}; OutputSanbox = {"std.out", "std.err"}; StdErr = "std.err“; StdOutput = "std.out"; Rank = other.GlueHostBenchmarkSI00 ; Requirements = other.GlueCEStateStatus == "Production"; Condor Week 2008, May 2008

  24. Interactive Job Support Particle trajectories in Fusion devices Increasing the temperature of a gas, we get a plasma state • At this temperature, the union of light atom nuclei is possible through an exothermal process: • Mass after fusion process is less than before it • Exceeding mass -> energy Condor Week 2008, May 2008

  25. Time Sharing Grid Resource CrossBroker INT. JOB LRMS Scheduling Agent Condor GlideIn Application Launcher VM1 VM2 Condor-G BATCH Condor Week 2008, May 2008

  26. Time Sharing Grid Resource CrossBroker LRMS Scheduling Agent Agent Application Launcher VM1 VM2 Condor-G INT. JOB BATCH Startup-time Reduction Only one layer involved Condor Week 2008, May 2008

  27. Conclusions • CrossBroker supports both Parallel and Interactive jobs • Automatically • Interoperable with EGEE • Glide In • Fast startup of jobs • Co-allocation without reservation or wasting resources • Real Applications • Visualization of plasma in fusion devices • Evolution of pollution clouds in the atmosphere • Ultrasound Computing Tomography: Reconstruction of a 3D volume • FLUIDYNAMICS application Condor Week 2008, May 2008

  28. Questions? Elisa Heymann Department of Computer Architecture and Operating Systems

More Related