220 likes | 413 Views
Grid Resource Allocation and Management (GRAM). Execution management Deployment, scheduling and monitoring Community Scheduler Framework (CSF): Provides a single interface to different resource schedulers. PBS, Condor(G). Workspace management
E N D
Grid Resource Allocation and Management (GRAM) • Execution management • Deployment, scheduling and monitoring • Community Scheduler Framework (CSF): Provides a single interface to different resource schedulers. • PBS, Condor(G). • Workspace management • Dynamically create and manage workspaces on remote hosts. • Grid Telecontrol Protocol • WSRF-enabled service interface for control of remote instruments. • Remote goldfish surgical procedures.
Jobs are computational tasks that may perform input/output operations while running. • Affect the state of the computational resource and its associated file systems. • May require coordinated staging of data into the resource prior to job execution and out of the resource following execution. • Some users, particularly interactive ones, benefit from accessing output data files as the job is running. Monitoring consists of querying and subscribing for status information such as job state changes.
Monitoring consists of querying and subscribing for status information such as job state changes. • Operated under the control of a scheduler which implements allocation and prioritization policies (i.e., priorities). • GRAM is not a resource scheduler but a protocol engine for communicating with different local resource schedulers.
Conceptual Details • Targeted Job Types • Not “RPC” • reliable operation, stateful monitoring, credential management, and file staging are important (i.e., the performance is horrible so only use if necessary).
Component Architecture • Based on Component architecture • Job management services • represent, monitor, and control the overall job life cycle. These services are the job-management specific software provided by the GRAM solution. • File transfer services • support staging of files into and out of compute resources.
Component Architecture • Credential management services • are used to control the delegation of rights among distributed elements of the GRAM architecture based on users' application requirements.
Security • Secure Operation • WS GRAM utilizes WSRF functionality to provide for authentication of job management requests as well as to protect job requests from malicious interference. • Local System protection domains • jobs are executed in appropriate local security contexts • e.g. under specific Unix user IDs based on details of the job request and authorization policies.
Credential delegation and management • Client may delegate some of its rights to GRAM services • e.g. rights for GRAM to access data on a remote storage element as part of the job execution. • Audit • To assist with normal accounting functions as well as to further mitigate risks from abuse or malfunction.
Job Management • Reliable job submission. • “at most once” semantics • Job Cancellation • a mechanism for clients to cancel (abort) their jobs at any point in the job life cycle.
Data Management • Reliable Data Staging • reliable, high-performance transfers of files between the compute resource and external (gridftp) data storage elements before and after the job execution. • Output Monitoring • mechanism for incrementally transferring output file contents from the computation resource while the job is running.
Task Coordination • Parallel Jobs • Task rendezvous • mechanism for task rendezvous which job applications may use if they do not have another more appropriate solution • Usually done in MPI
WS-GRAM (Web Services version). • Designed to support job execution with coordinated file staging. • Uses a set of Web services in the GT4 WSRF core. • ManagedJob: Provides interface to monitor the status of the job, terminate. Each submitted job is a distinct resource. • ManagedJobFactory: Interface to create ManagedJob resources of appropriate type to perform a job in that local scheduler. • ManagedJob resource creation ManagedJobFactory::createManagedJob invocation.
Creation of Job • ManagedJobFactory::createManagedJob invocation. • A meaningful WS GRAM client MUST create a job that will then go through a life cycle where it eventually completes execution and the resource is eventually destroyed • Optional Staging Credentials • Must be performed before call to createMnagedJob • Optional Job Credential • Store into user account for use by job process.
Optional Credential Refresh • Credentials delegated may be refreshed. • Optional Hold of Cleanup • User wants to directly access output files without waiting for stage-out. • ManagedJob Destruction • Can explicitly destroy job.
Globus Toolkit Components used by WS GRAM • Reliable File Transfer (RFT) • For file staging before and after job completes. • GridFTP • Supports retry • Partial file transfer • 3rd party file transfer
GridFTP FOO2 FOO1
GridFTP FOO2 FOO1
Delegation Services • Can delegate credentials to any service that is deployed in the same container as the service. • Tells delegation service it wants to delegate its credentials. • The service that wants to use them must contact the delegation service to acquire them.
External Components Used by WS GRAM • Local job scheduler: • PBS, LSF, Condor • Sudo • Access to user accounts without having root privilege.