300 likes | 412 Views
Interactivity and fast job allocation at the level of Resource Brokers. Miquel A. Senar Universitat Autònoma de Barcelona Spain. Abstract.
E N D
Interactivity and fast job allocation at the level of Resource Brokers Miquel A. Senar Universitat Autònoma de Barcelona Spain
Abstract • Most Grid middleware technologies developed in recent years have been aimed at the execution of sequential batch jobs. However, some users require a certain kind of interactive access. • This talk describes our experience with CrossBroker to provide transparent and reliable support for interactive applications. • Our solution is based on two main notions: • split execution and interposition agents used to stream I/O. And we review how we have applied interposition agents transparently to sequential and MPI applications. • a simple multiprogramming mechanism that is used to start interactive applications as fast as possible. Budapest EGEE’07, 1-5 October 2007
Outline • System Architecture and Interactivity Problems • CrossBroker: Functionality and Architecture • I/O streaming • Glidein and Multiprogramming • Open Issues and Conclusions Budapest EGEE’07, 1-5 October 2007
SERVICES Middleware Middleware Middleware Grid Architecture Resource directory File Replica Management Monitorization Unified resource vision Authentication/Security Start/Control of jobs File transfer Communication Internet REMOTE SITE REMOTE SITE Budapest EGEE’07, 1-5 October 2007
Job F1 F2 SERVICES O1 O2 Middleware Middleware Middleware Job F1 F2 Batch execution on Grids Internet REMOTE SITE REMOTE SITE Budapest EGEE’07, 1-5 October 2007
Job F1 F2 SERVICES Middleware Middleware Middleware Job F1 F2 Interactive Job Execution • Fast start-up • Execution in high-occupancy situations Internet REMOTE SITE REMOTE SITE Budapest EGEE’07, 1-5 October 2007
Grid Environment Constraints • No privileges • Minimal need for preinstalled components • No changes to the • LRMS or • applications • Outdated information • Dynamic changes • LRMS (PBS, LSF, • Condor): limited • external control • Non cooperative LRMS • Local user jobs SERVICES Information Index Internet middleware middleware LRMS LRMS Budapest EGEE’07, 1-5 October 2007
CrossBroker Information Index Migrating Desktop Scheduling Agent Resource Searcher CrossBroker Replica Manager Application Launcher Condor-G DAGMan CE CE EGEE/Globus EGEE/Globus LRMS LRMS WN WN Budapest EGEE’07, 1-5 October 2007
CrossBroker features • Jobs described in a text file using JDL (Job Description Language) • gLite interoperability • accepts jobs from gLite's UI • able to submit jobs to gLite resources (LCG-CE and gLite CE) • Focuss on jobs not fully supported by gLite: • parallel jobs (MPI) • Run in more than one resource /site , in a coordinated fashion. • Interactive jobs • The user interacts with the application during its execution Budapest EGEE’07, 1-5 October 2007
Interactivity Requirements • Online Input-Output streaming: the ability to have application input and output online. • Fast startup: the possibility of starting the application immediately, also taking into account scenarios in which all computing resources might be running batch jobs. Budapest EGEE’07, 1-5 October 2007
I/O Streaming Support • I/O stream managed by a streaming engine. Typical architecture, one console (user interface) and one agent (application) • Examples: • Condor Bypass: library interposition • glogin + gvid: shell-like • CrossBroker injects interactive agents that enable communication between user and job • Transparent to the user • Full integration with bypass and glogin & gvid • Support for interactivity in all kinds of jobs • sequential and all MPI flavors Budapest EGEE’07, 1-5 October 2007
I/O streaming Injects the agent into the job CrossBroker Console-like component Started at the UI or at the MD Job Console Resource Job RPC I/O Job Agent stdinstdout stderr SO Traps input/output operations and sends them to the Job Shadow Supported agents: Glogin (shell-like)Condor ByPass (I/O library interception) Budapest EGEE’07, 1-5 October 2007
Interactive Support for video streaming Interactive Job Submission Plugin Migrating Desktop Roaming Access Server Cross Broker Glogin submission support Java visualization plugin G-login SERVICES GVid Information Index Replica Manager Internet GSS secured Video Stream gLite gLite CE CE G-login GVid WN WN Budapest EGEE’07, 1-5 October 2007
JDL: Interactive jobs • INTERACTIVE: true/false. Indicates that the job is interactive and the broker should treat it with higher priority • INTERACTIVEAGENT: streaming mechanism (currently, bypass and glogin) • INTERACTIVEAGENTARGUMENTS • These attributes specify the command (and its arguments) used to communicate with the user. Budapest EGEE’07, 1-5 October 2007
JDL: Interactive jobs Type = "Job"; VirtualOrganisation = "imain"; JobType = "Parallel"; SubJobType = “openmpi"; NodeNumber = 11; Interactive = TRUE; InteractiveAgent = “glogin“; InteractiveAgentArguments = “-r –p 195.168.105.65:23433“; Executable = "test-app"; InputSandbox = {"test-app", "inputfile"}; OutputSanbox = {"std.out", "std.err"}; StdErr = "std.err“; StdOutput = "std.out"; Rank = other.GlueHostBenchmarkSI00 ; Requirements = other.GlueCEStateStatus == "Production"; Budapest EGEE’07, 1-5 October 2007
Fast Start-Up for interactivity • Interactive jobs are sent to sites with available machines (problems with inaccurate information) • If there are not available machines, use time sharing • Possibility of starting the application immediately, also taking into account scenarios in which all computing resources might be running batch jobs. • Users should run interactive jobs with limitations. Otherwise, nobody will run batch jobs if interactive jobs run immediately at no cost for the user. • Resource usage has to be paid somehow and the cost has to be higher when users run interactive jobs. • “Payment” can be done in terms of user priority (more later). Budapest EGEE’07, 1-5 October 2007
Time Sharing on Grid Resources: Glide-in mechanism • Main idea: • Wrap every batch job with an agent (glide-in) that will get control of the remote machine independently of its local resource manager. • Goals: • Agents enable simple multiprogramming between interactive and batch jobs. Interactive jobs may run even when no free resources are available. • Agents can also be used as a fast start-up mechanism. • Agent can control the amount of CPU that an interactive job gets according to QoS requirements expressed by the user in the JDL. Budapest EGEE’07, 1-5 October 2007
Time Sharing on Grid Resources: Glide-in mechanism • For each batch job, the broker submits an agent (glide-in) to the Grid • The agent is created in a temporarily-acquired Grid resource. The resource is logically divided into two “Virtual Machines” • The agent reports back to the broker and starts the batch job in one VM • When needed, the broker sends (directly) an interactive job to the agent who runs it with a higher priority than the batch job (simple time sharing) at the second VM Budapest EGEE’07, 1-5 October 2007
Time Sharing Grid Resource CrossBroker LRMS BATCH JOB Scheduling Agent Condor-G Budapest EGEE’07, 1-5 October 2007
Time Sharing Grid Resource CrossBroker LRMS BATCH JOB Scheduling Agent Application Launcher Condor-G Budapest EGEE’07, 1-5 October 2007
Time Sharing Grid Resource CrossBroker LRMS BATCH JOB Scheduling Agent Agent Application Launcher VM1 VM2 Condor-G Budapest EGEE’07, 1-5 October 2007
Time Sharing Grid Resource CrossBroker LRMS BATCH JOB Scheduling Agent Agent Application Launcher VM1 VM2 Condor-G Budapest EGEE’07, 1-5 October 2007
Time Sharing Grid Resource CrossBroker LRMS Scheduling Agent Agent Application Launcher VM1 VM2 Condor-G BATCH JOB Batch Jobrunning Budapest EGEE’07, 1-5 October 2007
Time Sharing Grid Resource CrossBroker INT. JOB LRMS Scheduling Agent Agent Application Launcher VM1 VM2 Condor-G BATCH Budapest EGEE’07, 1-5 October 2007
Time Sharing Grid Resource CrossBroker LRMS Scheduling Agent Agent Application Launcher VM1 VM2 Condor-G INT. JOB BATCH Startup-time reduction Only one layer involved Budapest EGEE’07, 1-5 October 2007
Response Time CrossBroker CE + WN Budapest EGEE’07, 1-5 October 2007
Open Issues • Still missing a generalized priority schema for grid users and jobs (business model). It’s more a social than a technical issue. • Things are more easy if infrastructure is simple (single Broker, uniform middleware, same LRMS, same policies,…) • More likely, users and institutions will share the same local infrastructure between several Grids (interactive grid will be one of them): multiple brokers, multiple LRMS, different policies,… • Some solutions (under investigation): • Fair share with different penalty factors • Batch jobs worsen the priority according to the resources used • Interactive jobs worsen the priority faster than batch ones (twice, three times,…?) • If batch and interactive jobs run in the multiprogramming schema, priority worsening is adjusted according to the amount of CPU consumed by each one • Interactive CE • Automatic injection of glidein wrappers for all jobs submitted from other sources different of CrossBroker. • Possibility to add advance reservations Budapest EGEE’07, 1-5 October 2007
Interactivity and Grid Business Model RC B RAS LCG Broker UI LCG CE Int.eu.grid CE SE classical VOMS SE SRM RC A CrossBroker VO Applications Users Cluster Manager LCG CE WN WN WN Int.eu.grid CE SE SRM Budapest EGEE’07, 1-5 October 2007
Conclusion Need for a business model that: • Commitment of resources (specific for a VO, shared between VOs,…) • Payment / Charging to users (VOs) • Happy Users: access to resources • Happy providers: resources are used productively. Budapest EGEE’07, 1-5 October 2007