370 likes | 489 Views
CoreGRID Summer School Bonn, July 25, 2006. Resource Orchestration in Grids Wolfgang Ziegler Department of Bioinformatics Fraunhofer Institute SCAI. Outline. What is Orchestration and (why) do we need it ? Overview on existing Brokers & MetaSchedulers General Architecture GridWay
E N D
CoreGRID Summer School Bonn, July 25, 2006 • Resource Orchestration in Grids • Wolfgang Ziegler • Department of Bioinformatics • Fraunhofer Institute SCAI
Outline • What is Orchestration and (why) do we need it ? • Overview on existing Brokers & MetaSchedulers • General Architecture • GridWay • EGEE Workload Manager Service • Condor-G • Nimrod & Co • Grid Service Broker • Calana • MP-Synergy • SUN N1 & SGE • MOAB • Platform CSF • KOALA • MetaScheduling Service • Grid Scheduling Architecture • OGF Standardisation Activities • GSA Research Group • OGSA-BES & OGSA-RSS Working Groups • Examples (VIOLA-ISS/PHOSPHORUS/BODEGA/SAROS) • VIOLA MetaScheduling Service • What next
What is Orchestration and (why) do we need it ? • It is one of the Grid/Web buzz words • Here is a definition for the term: Orchestration or arrangement is the study and practice of arranging music for an orchestra or musical ensemble. In practical terms it consists of deciding which instruments should play which notes in a piece of music. • In the Web-Service domain the term Orchestration is used for execution of specific business processes using WS-BPEL is a language for defining processes • In the Grid domain the term Orchestration is used for coordination (usually including reservation) of multiple resources for usage by one single application or a workflow. • No (de-facto) standards are available so far for description and management,WS-Agreement + JSDL + other term-languages could become one. • The basic idea is: planning trough negotiation for controlled (better) QoS instead of queuing for best effort
Scenarios where Orchestration is needed • Running applications with time constraints • Applications must deliver results at a certain time • Application may not start before a fixed time • Co-allocation • Applications requiring more than one resource, e.g. • distributed visualisation • multi-physics simulations • data intensive applications requiring more storage than locally available • dedicated QoS of network connections for distributed applications • Workflows • Individual interdependent components
UNICORE Gateway UNICORE Gateway NJS NJS Primary NJS TSI TSI TSI Example: What might happen without Orchestration Site A Site B The user describes his Job UNICORE Client The Job is passed to the UNICORE System The Primary NJS distributes the Job to all sites The Job is submitted to the local batch-queues of all systems Local Scheduler Local Scheduler Local Scheduler Job Queue Job Queue Job Queue The components of the Job are started - depending on the state of the local batch-queues Cluster Cluster Cluster The quality of the network connections depends on the actual load
Constraints of Orchestration • Process has to respect site autonomy and site policies • done through negotiation and use of local RMS/scheduling systems • Reservation on behalf of the requesting user • Done through mapping to local id of the user • Without SLA no guarantee at all, with SLA guarantees are in place, bur may be cancelled • failure of resources or service providers decision to prefer another, e.g. more profitable, job may cause unforeseen break of contract • if one SLA fails, what happens to the other ones? • Penalties agreed upon as part of the SLA can cut one's losses • Need to Identify acting and responsible parties beforehand • e.g. broker, local scheduling system, adapter, other instance of service/resource provider, client side process/component • Need a tool/service to manage the orchestration - might be either local or remote • Local RMS/schedulers must provide support for advance reservation
Crucial Properties of local Scheduling systems • Full backfill algorithm • Estimation of worst case start/stop for each job (preview) • Node range specification • Start time specification • Special resource requirement specification • „very low priority“ jobs (Z-jobs) • Communication friendly node allocation strategy • Portable: available on different parallel machines • Graphical user interface • Status information available via WEB interface • Priority scheme (project, resources, waited time) • Reserved slots are fixed and are no longer subject for scheduling
Overview on Brokers & MetaSchedulers Possible components/interaction of a scheduling infrastructure of Global Grids (OGF GSA-RG)
Overview - GridWay Environment:Globus GT2.4, GT4 Features & Scope: - Works on top of multiple local schedulers (LSF, PBS (Open, Pro) SGE, N1) - Supports migration of jobs based on monitoring of job-performance. - Support for self-adaptive applications (modification of requirements and migration request) - Provides the OGF DRMMA API for local systems able to provide the DRMMA bindings (currently SGE and N1) License:Open Source, GPL v2 Support:GridWay on-line support forum
Overview - EGEE Workload Manager Service Environment:LCG, gLite (Globus GT2.4, GT4) Features & Scope: - Two modes of job-scheduling: - submits jobs through Condor-G pull-mode, the computational Grid takes - the jobs from the queue - Eager or lazy policy for job scheduling: early binding to resources (one job/multiple resources) or matching against one resource becoming free (one resource/multiple jobs) - Works on top of multiple local schedulers (LSF, PBS (Open, Pro) SGE, N1) - Supports definition of workflows with specification of dependencies - Support for VOs, accounting License:Open Source license Support:mailing lists and bug reporting tools
Overview - Condor-G Environment:Globus GT2.4 - GT4, UNICORE, NorduGrid Features & Scope: - Fault tolerant job-submission system, supports Condor’s ClassAD match-making for resource selection - Can submit jobs to local scheduling systems (Condor, PBS (Open, Pro) SGE, N1) - Supports workflow interdependency specification through DAGman - Allows query of job status and provides callback mechanisms for termination or problems - Provides the OGF DRMMA API for local systems with DRMMA bindings License:Open Source Support:free (mailing list) & fee-based (telephone, email)
Overview - Nimrod/G & Co Environment:Globus, Legion, Condor Features & Scope: - Focused on parametric experiments Collaborates with multiple local schedulers (LSF, PBS (Open, Pro), SGE) - Follows an economic approach based on auctioning mechanisms including resource providers and resource consumers - API to write user-defined scheduling policies License:Royalty free license for non-commercial use. Enfuzion is a commercially licensed variant provided by Axceleon. Support:Limited support by email
Overview - Grid Service Broker Environment:Globus GT2.4, GT4, UNICORE in preparation Features & Scope: - MetaScheduler supporting multiple heterogeneous local schedulers (Condor, PBS (Open, Pro), SGE) from the GridBus project - Interface to local systems either SSH or GRAM - Supports integration of user-defined custom scheduler - Supports scheduling based on deadline and budget constraints License:GNU Lesser General Public License
Overview - Calana Environment:Globus Toolkit 4, UNICORE Features & Scope: - Agent based MetaScheduler for research and commercial environments. - Follows an economic approach based on auctioning mechanisms including resource providers (auctioneer) and resource consumers (bidders). - Collaboration with different local resources possible through implementation of appropriate agents. License:Research prototype used in the Fraunhofer Resource Grid
Overview - MP Synergie Environment:Relies on Globus Toolkit 2.4. Enterprise Grids Features & Scope:- Scheduling decisions based on various parameters, e.g. load of the local scheduling systems, data transfer time - Accounting Mechanism - Supports other local schedulers (LSF, PBS (Open/Pro), UD GridMP, SGE, Loadleveler, Condor) - Job submission may be respect availability of e.g. licenses License:Proprietary commercial License Support:Paid support by United Devices
Overview - SUN N1 & SGE Environment:Stand alone, can optionally be integrated with other Grid middleware. Features & Scope: - Allows exchange of the built-in scheduler with a user-provided scheduler - Supports advance reservation if the built-in scheduler is replaced by an appropriate scheduler License:Proprietary commercial License for N1, SGE is Open Source Support:Paid support by SUN for N1 AR
Overview - MOAB Grid Scheduler Environment:Stand alone, can optionally rely on Globus Toolkit middleware for security or and user account management. Enterprise Grids. Features & Scope: - Bundle of MOAB, Torque, and MOAB workload manager builds a complete stack for computational Grids - Supports other local schedulers (LSF, OpenPBS, SGE, N1) - Supports advance reservation, query of reservations of various resources, e.g. hosts, software licenses, network bandwidth - Local schedulers must support advance reservation License:Proprietary commercial License Maui Grid Cluster Scheduler (limited variant) available on a specific Open Source license Support:Paid support by Cluster Resources Inc. AR
Overview - Platform CSF & CSF Plus Environment:Globus GT4 Features & Scope: - Coordinates communication among multiple heterogeneous local schedulers (LSF, OpenPBS, SGE, N1) - Supports advance reservation, query of reservations of various resources, e.g. hosts, software licenses, network bandwidth - Local schedulers must support advance reservation - API to write user-defined scheduling policies License:Open Source Support:Paid support by Platform and other companies AR
Overview - KOALA Environment:Globus Toolkit Features & Scope: - MetaScheduler of the Dutch DAS-2 multicluster system. Supports co-allocation of compute and disk resources. -Collaboration with the local schedulers openPBS, SGE. - Local schedulers must support advance reservation - Support for MPI-Jobs (mpich-g2) License: AR
Overview - MetaScheduling Service • Environment:UNICORE, GT4 in preparation • Features & Scope: • - MetaScheduling Web-Service supporting reservation, co- allocation and SLAs between the MetaScheduler and the client. • Collaboration with different local schedulers through adapters (EASY, PBS (open, Pro); SGE in preparation • - Local schedulers must support advance reservation • Supports orchestration of arbitrary resources, e.g. compute resources and network; storage and licenses in preparation • multiple MSS may be organised hierarchically • Support for MPI-jobs (MetaMpich) • - Support for workflows under work • - End-to-end SLAs between service provider and service consumer in the next version • License:First version used in the VIOLA Grid testbed available for collaborating partners. • Support:by email and bug reporting tools AR
Source Scheduler Workload Manager Adapter Broker Broker PhastGrid Agent PhastGrid Agent Adapter Agent Unicore Agent Other Scheduler PhastGrid Resource PhastGrid Resource Grid Scheduling Architectures (1) Integrating Calana with other Schedulers Another scheduler submits job to Calana Calana submits jobs to another scheduler
VO META-SCHEDULER USER 1 USER m globus-job-run, Condor/G, Nimrod/G … CLI & API GridWay core GridGateWay Execution Drivers Transfer Drivers Information Drivers Scheduling Module Standard protocols & interfaces (GT GRAM, OGSA BES…) GridWay GLOBUS GRID VO META-SCHEDULER GRAM RFT MDS GRAM RFT MDS GRAM RFT MDS GRAM RFT MDS SGE Cluster PBS Cluster LSF Cluster USER 1 USER m USER n USER s RESOURCE 1 RESOURCE 2 RESOURCE n CLI & API GLOBUS GRID INFRASTRUCTURE A (VO A) GridWay core Execution Drivers Transfer Drivers Information Drivers Scheduling Module GRAM RFT MDS GRAM RFT MDS GRAM RFT MDS SGE Cluster PBS Cluster LSF Cluster RESOURCE 1 RESOURCE 2 RESOURCE n GLOBUS GRID INFRASTRUCTURE B (VO B) Grid Scheduling Architectures (2) GridWay Federation of Grid Infrastructures with GridGateWays (Grid4Utility Project)
Grid Scheduling Architectures (3) Viola MetaScheduling Service Multi-level MetaScheduling
Open Grid Forum Standardisation Activities (1) • Grid Scheduling Architecture Research Group (GSA-RG) • Addressing the definition a scheduling architecture supporting all kind of resources, • interaction between resource management and data management, • co-allocation and the reservation of resources, including the integration of user or provider defined scheduling policies. • Two sub-groups of the Open Grid Service Architecture Working Group: • Basic Execution Service working Group (OGSA-BES-WG) • OGSA-Resource Selection Services Working Group (OGSA-RSS-WG) • will provide protocols and interface definitions for the Selection Services portion of the Execution Management Services (components: CSG and EPS) • Grid Resource Allocation Agreement Protocol Working Group (GRAAP-WG) • Addressing proposed recommendation for Service Level Agreements • WS-Agreement template and protocol • Allows definition of Guarantee Terms, e.g. SLOs, Business Values, KPI, Penalties
Open Grid Forum Standardisation Activities (2) • Execution Management Services (OGSA-EMS-WG) (focusing on creation of jobs) • Subset: Basic Execution Service working Group (OGSA-BES-WG)
Resource Pre-selection • Resource pre-selection necessary to reduce the number resources/service providers to negotiate with • RSS can exploit multiple criteria, e.g. • User/Application supplied selection criteria • Orchestration Service focuses on negotiation, reservation and resulting SLAs • Final selection of resources from the set provided by the RSS, e.g. • Availability of resources • Costs depending on possible reservation times or computing environment • Costs caused by delay Monitoring data from Grid monitoring services • Ongoing or planned national/EU projects • VIOLA-ISS (pre-selection based on monitoring data of previous application runs) • PHOSPHORUS / BODEGA (pre-selection based on semantic annotation of applications) • SAROS (pre-selection based on actual Grid monitoring data)
VIOLA MetaScheduling Service • Developed in a German Project for the evaluation of the Next Generation of NREN • Focus on Co-allocation and support for MPI-Applications • Compute Resources (nodes of different geographically dispersed clusters) • End-to-End network bandwidth between cluster nodes • Implements WS-Agreement for SLAs • Negotiation Protocol will be incorporated OGF-draft (WS-Negotiation, extending WS-Agreement protocol)
Site n Adapter Site 1 Network HTTPS Adapter Adapter Local Scheduler Site n HTTPS Local Scheduler Site 1 Network RMS Switch/Router Partial job n Partial job 1 MetaScheduler - Integration of local Schedulers UNICORE • Negotiation of timeslot & nodes withlocal schedulers for each job • UNICORE initiates the reservation and submits the job-data • UNICORE Client / MetaScheduler Service interface using WS-Agreement protocol • Interface MetaScheduler / Adapters based on HTTPS/XML (SOAP) • Interface between MetaScheduler Service and local RMS implemented with adapter pattern • Authentication and Communication of Adapter and local Scheduler with ssh WS-Agreement MetaScheduler HTTPS / XML HTTPS / XML HTTPS / XML HTTPS … Submission of job data
UNICORE Gateway UNICORE Gateway NJS NJS Primary NJS Network RMS ARGON TSI TSI TSI Link Usage The user describes his Job Example: What happens with SLA Site A Site B MetaScheduling Request (WS-Agreement template) UNICORE Client The Job is passed to the UNICORE System MetaScheduler Response (WS-Agreement) MetaScheduler Negotiations and Reservations Local Scheduler Local Scheduler Local Scheduler Job Queue Job Queue Job Queue All Components of the Job are started at the point in time agreed upon, at the same time the network connections are switched on Adapter Adapter Adapter Adapter Cluster Cluster Cluster
Lokale Site UNICORE Wrapper UNICORE: Submission of Job Data and Generation of Job Wrapper MetaScheduler Wrapper Generation Request for MetaScheduling Local Scheduler Local Wrapper Reservation Adapter Generation and Execution Local Wrapper MetaScheduler Wrapper UNICORE Wrapper MetaScheduler – Running Jobs • UNICORE generates UNICORE Wrapper with Job Data • Local adapter generates local wrapper for the MetaScheduler and for the execution of the UNICORE Job • Local Adapter submits Job with MetaScheduler Wrapper • Local Scheduler generates Wrapper for the Execution of the MetaScheduler Wrapper
Submit of Reservation with QoS specification and acknowledgement of reservation Network Resource Manager Site A • Bind of IP-Addresses at run-time MetaScheduler Local Scheduler R Netz Site n Site B R R 2 GB/s 1 GB/s Network Resource Management System • 1.) Reservation of required Resources • Submit of a Reservation to the Network Resource Manager • Acknowledgement of Reservation • 2.) Bind of IP-Addresses at Run-time • IP-Addresses are published at run-time of the job through the local Adapter • Bind of the IP-Addresses by the Network Resource Manager • Without explicit Bind the QoS Parameters for the Site-to-Site Interconnection are used
reservations available bandwidth t current time book ahead time Network Resource Manager – Supported Features • Immediate Reservation / Advance Reservation • Reservation: within the „book ahead“-Timeframe (i.e. the timeframe the system manages reservations in future) • Class: Determines the QoS-Level • Network User: id of user the QoS shall be guaranteed • Data for a Reservation: • Job ID, Start-time, Duration, Network user • List of 3-Tupels {Start-/Endpoint, Class}
Network Resource Manager Application Interface • Necessary Functions: • ResourceAvailableAt (Preview) • Returns time slots when a Resource (End-to-end connection with QoS Level) will be available • Submit • Start-time, Duration, Class, Start-/End-pointt (Site), User, • Returns a Resource Identifier (RESID) • Cancel <RESID> • Resource Manager frees the Resources attached to Resource Identifier (RESID) • Status <RESID> • Returns state of a connection (submitted, active, released, Class, start-time, end-time, user, etc.) • Bind <RESID> • Binding of IP-Addresses of nodes Network Resource Manager RessourceAvailableAt Submit Cancel Status Bind
What Next ? • Grid - SOA convergence • Supporting resources as services • Composition of small services need agile and lightweight orchestration services • Timing problems with dynamic apps e.g. when doing parallel IO an demand • Currently high latency • Full support for workflows based on which description language? • Semantic support • GSO: scheduling ontology for automatic determination of scheduler capabilities and selection of appropriate one
Acknowledgements • Some of the work presented in this lecture is funded by the German Federal Ministry of Education and Research through the VIOLA project under grant #01AK605F. This presentation also includes work carried out jointly within the CoreGRID Network of Excellence funded by the European Commission’s IST programme under grant #004265.