520 likes | 635 Views
Grid Interoperability Issues in Resource Management: Questions and Solutions. Attila Kertész attila.kertesz@sztaki.hu MTA SZTAKI CoreGRID Institute on Resource Management and Scheduling. Overview. Introduction: Heterogeneity in Grids -> Need for Interoperability
E N D
Grid Interoperability Issues in Resource Management: Questions and Solutions Attila Kertész attila.kertesz@sztaki.hu MTA SZTAKI CoreGRID Institute on Resource Management and Scheduling Budapest, Hungary, 3-7 September, 2007
Overview • Introduction: • Heterogeneity in Grids -> Need for Interoperability • Solutions for Grid Interoperability: • It can be targeted in different levels of Grid Systems • Regarding Resource Management, we see 3 approaches: • Extending current Resource Management Systems • Interfacing RMSs from portals • Developing a higher level mediator to utilize RMSs • Conclusions and future directions
Current situation and trends inGrid Computing • Fast evolution of Grid systems and middleware: • Globus Toolkit (GT2->3->4), EGEE (LCG-2->gLite), UNICORE, … • Many production Grid systems are built with them: • EGEE (LCG-2 gLite), UK NGS (GT2), Open Science Grid (GT2 GT4), NorduGrid (~GT2) • Although the same set of core services are available everywhere, they are implemented in different ways: • Certificate management, Job submission, File management
How to achieve Grid Interoperability? 3. level GRID architecture Higher level services 2. level Grid Middleware 1. level Operating Systems
Which levels should we target? • At the 1-2. level, establishing interoperability would be the smartest, but also the hardest. • The 3. level is the most preferable, since it requires the less modifications to the major architecture.
How can we use existing Resource Managers for Grid Interoperability? • Three possible directions in the resource management level of current grids: • I. Enable Resource Brokers to access resources of different Grids • II. Interface different brokers from Portals • III. Enable communication among Resource Brokers, or coordinate them by a higher-level tool
I. Extending Current RMSs • The most obvious way to provide interoperability among different grid systems is to extend the existing and widely used Resource Brokers with multiple grid middleware support. • This approach has several advantages and disadvantages, too: • Probably this modification would favor the users most, since they would not need to change their customs, submission methods. • But from the other point of view, it requires high efforts by the developers to interface new middleware services, so it is definitely a time consuming solution. Nevertheless the more system the broker supports, the more robust and unmanageable it becomes.
Related works • The Gridbus Grid Service Broker is designed for computational and data-grid applications. Although it supports all Globus middleware, Unicore, Nordugrid and it provides an interface to be implemented for other middleware support, it is mainly used in Globus grids. • Gridway is being developed in a Globus incubation project, therefore it supports all Globus versions and it also supports the EGEE middleware. • JSS is a decentralized resource broker that is able to utilize both GT4 and NorduGrid resources. • The UniGrids (GRIP) project aims at supporting interoperability with a semantic matching of the resource descriptions enabling job submissions to Globus and Unicore sites.
Demonstration: GTbroker • The first widespread and stable grid middleware was the Globus Toolkit 2. Since it lacked a Resource Broker, we developed a tool called GTbroker. • It uses GT2 C API functions to interact with Globus resources and perform job submissions. • For determining the available hosts in the grid it queries the MDS. The job submission to resources is done through GRAM, and a GASS server is used to put the files needed for the job to the remote host and to get back the result files if there are any. • These tools enable this broker to work without additional software on Globus grids (GT2, GT3 and pre-ws GT4). • Since most of the current production grids use this kind of middleware, its simply adaptation made this broker relevant.
Extension to EGEE middleware • To extend an RMS to support other types of middleware, we need to learn, how to interact with the new system. • Brokers need to gather resource information, move files, perform job submissions, track job states and retrieve output files. Most of these activities need interaction with different middleware services. • GTbroker was redesigned to support the LCG-2 (EGEE) middleware, by modifying the following parts: • information querry to be able to gather data from the BDII, and adding special attributes to the RSL to enable job submission in EGEE VOs. • Since the file movement, job description and job state tracking can also be done through the same Globus services in LCG-2 grids, we did not modify these parts (nevertheless for an entirely different middleware we should have done it).
First step towards Grid Interoperability User Portal GTbroker SEEGRID (LCG-2) NGS (GT2) Austrian Grid (GT4) VOCE (LCG-2)
Comparison tests • To prove usability we evaluated broker usage on LCG-2 Grids (VOCE, SEEGRID) • The brokers were invoked by scripts: • multiple invocation • state checking, log gathering • output staging back for LCG2 broker • We performed the tests in 4 phases varying job types and the number of jobs started at the same time
LCG-2 broker usage • In EGEE the Workload Management System is responsible for brokering • Job properties in JDL, resource information from BDII, job states from Logging and Bookkeeping • Default matchmaking: • Only ‘Production’ state resources are taken from BDII • The rank is the response time in resource selection
GTbroker features • Quality of Service features: GTbroker uses an extended RSL file that should contain the user requirements and job properties. • Regarding information systems: in Globus grids it queries the MDS, in LCG-2 grids the BDII. • During matchmaking a ranked list is created from the found resources in the BDII. • Fault tolerance is supported by resubmissions. Should a job fail or be pending for too long on a resource (this time interval can be set in the broker), the broker cancels and resubmits it to another high priority one.
Test Phases • 1. phase: 20 small single and MPI jobs to VOCE • 2. phase: 20 10 min jobs to both VOs, 20 10 min MPI jobs to SEEGRID • 3. phase: 60 10 min jobs to SEEGRID, 20 at a time, 5 min intervals • 4. phase: 60 ~15 min jobs to SEEGRID, 10 at a time, 4 min intervals
Test summary • Sometimes the LCG-2 broker selected long responding or even non-responding resources, its resubmission not always worked • GTbroker made reliable resubmissions and the hidden non-responding or draining resources were skipped • For jobs with short running time GTbroker produced better results, for larger jobs they performed about the same results, but GTbroker was more reliable • As GTbroker has an eager matchmaking, it usually takes the major part of the jobs to the same (‘best’) resource • The user can set a random selection within a range of resources, but this can draw back the performance
I. Conclusions • We have shown, how additional middleware support can be achieved by redesigning an existing Resource Broker • The results prove that existing resource brokers can be extended to use other middleware systems, but in this way developers need to redesign the system to support services of the additional middleware.
II. Multi-broker Utilization • To exploit the advantages of various brokers and grids at the same time, we need to use more grid Resource Management Systems. • In this situation we need to learn various job specification languages and broker capabilities. • Grid portals are the currently available tools, which try to hide the details of low level middleware utilization by providing a transparent, uniform interface. • In this kind of grid utilization we do not expect grid broker to support more middleware, but to do their best on their own ones.
Related works • The well known related works are Pegasus, GridFlow, K-Wf grid portal and SPA portal of the HPC-Europa Project. • Though the first 3 examples provide high-level access to grid services, they usually operate only on one middleware. • The SPA is a portal component that enables brokers to be utilized through plug-in interfaces. These interface methods need to be used by all brokers, providing the same abstract functionality; therefore during an integration the broker would also have to be modified. • Only the P-GRADE Portal supports the execution of multi-grid workflows in both Globus-, and EGEE-based production Grids.
Demonstration: The P-GRADE Portal • General purpose, workflow-oriented computational Grid portal • Supports the development and execution of workflow-based Grid applications • Based on GridSphere-2 • Easy to expand with new portlets (e.g. application-specific portlets) • Easy to tailor to end-user needs • Support for multi-grid workflows
What is a P-GRADE Portal workflow? • a Directed Acyclic Graph,where • Nodes represent jobs (batch programs to be executed on a computing element) • Ports represent input/output files the jobs expect/produce • Arcs represent file transfer operations • semantics of the workflow: • A job can be executed if all of its input files are available
The user can choose a broker for the job No resource should be selected! Further requirements can be specified by job description editors, which have similar interfaces Defining broker jobs
JDL and RSL Editor • Additional job-related requirements can be set in job description editors: • JDL Editor: • Creates a JDL file for the WMS • The user can set JDL attributes such as: Rank and Requirements, Environment variables, … • RSL Editor: • Creates an RSL file • Basic and special RSL attributes can be set such as: random resource, skip time…
Workflow Execution • P-GRADE Portal contains a DAGMan-based workflow manager subsystem • DAGMan degrades workflows into elementary file transfer and job submission tasks, and schedules these tasks according to their dependencies • The submission is done by its pre/post scripts: • When a broker is used for job submission, the pre script invokes the broker, and the post script waits till the execution is finished, and provides information about the actual job status for the portal
Broker invocation • The portal can invoke different brokers to reach resources of different Grids • While DAGMan schedules the workflow nodes, the brokers do the actual job submissions
User Second step towards Grid Interoperability NGS GT2 P-GRADE Portal Manchester EGEE: VOCE / SEEGRID SwissGrid Poznan Lausanne Budapest EGEE WMS GTbroker NorduGrid broker
II. Conclusions • Portals provide a uniform access to grids • Managing multiple Brokers simultaneously in a transparent way seems to be a good solution to establish Grid Interoperability • Though current portals provide a transparent access to grids, users still need to manually set up workflows and choose RMSs for each job in the workflow. • Again, with examining the available brokers, users could learn the capabilities of the usable brokers, but they are lacking dynamic information, such as successful submission rate, background load of the VO of the brokers, reliability of the brokers and so on.
III. Meta-Brokering approaches • Users can have certificates to access more Grids or VOs • A new problem arises in this situation: which VO, which broker to choose for my specific application? • Just like users needed Resource Brokers to choose proper resources within a VO, now they need a Meta-Brokering service to decide: • which broker (and VO) is the best for them, • and also to hide the differences of utilizing them.
Related works • Meta-brokering is a quite new topic, though the need for interoperable grid networks has already been identified by different research groups. • The InterGrid vision is to operate so-called Gateways communicating with IntraGrid RMs, which should be implemented in all the Grids participating in the network. This vision cannot be realized in current technologies. • The HPC-Europa Project researchers also considered to take steps towards meta-brokering as well as the LA Grid Project. They are both thinking of an intercommunicating peer-to-peer architecture of their current RMSs, which also takes time and needs redesign of their brokers.
Interacting with the Meta-broker User Meta-Broker 1 1 2 3 2 VO 1 VO 2 VO 3 VO 4 Grid X Grid Y
Languages of the Meta-Broker • Job Submission Description Language (JSDL): • for specifying job requirements • extension for special attributes • Broker Property Description Language (BPDL): • for storing the properties of the utilized brokers • updating the performance data of the brokers
Third step towards Grid Interoperability User Portal or EGEE WMS Submission results Job description (JSDL) Broker name, its JDL Job status, output GTbroker EGEE grid NorduGrid Broker Parser Meta-Broker Core Invoker GT2 grid . . . Information Collector SwissGrid Translator BPDL List VO Load . . . MB Languages Matchmaker MB Health IS Agent
Scenarios User or Portal a.) Broker name/ID, Middleware/VO, its JobDL, proxyname Job request (JSDL) BrokerID, Submission results 1. 8. 7. 6. Meta-Broker Core Translator IS Agent 2. Parser Information Collector Broker ID, Middleware/VO, JobDL 3. 5. 9. MatchMaker BPDL List 4. VO Load MB Languages MB Health
Scenarios User b.) Job description (JSDL), Input files Submission result, Output files 1. 11. 7. Grid Broker 6. Meta-Broker Core Translator Invoker 8. 9. Grid Broker 2. Parser IS Agent … Information Collector 3. 5. 10. MatchMaker BPDL List 4. VO Load MB Languages MB Health
Components of the architecture I. • The Meta-Broker is the core component: this communicates with the other components • The Translators are responsible for transforming the user request to the language of the actually selected Broker (JSDL<-> JDL,RSL,xRSL…) • The Invokers hand over the job to the brokers and wait for the results • They provide additional information for the Information Collector about the submissions
Components of the architecture II. • The Information Collector stores the connected broker properties and historical data of the previous submissions • This information shows: • whether the chosen broker is available, or how reliable it is • what kind of jobs can be submitted to which broker (some brokers provide QoS agreements, some better data-handling, …) • what is the current load of the resources reachable by the utilized brokers – these values are regularly updated by IS Agents
Matchmaking • The Matchmaker compares the JSDL of the actual job to the BPDL of the registered resource brokers • First the basic attributes are matched against the basic properties: this selection determines a group of brokers that are able to submit the job • In the second phase those brokers are kept, which are able to fulfill the special requirement attributes of the job • Finally a priority list of the remaining brokers is created taking into account the ranks (stored for the requested features) and the load of the underlying grid of each broker
User Meta-Broker Utilization by Portals NGS GT2 P-GRADE Portal Manchester EGEE: VOCE / SEEGRID SwissGrid Poznan Lausanne Budapest Meta-Broker GTbroker NorduGrid broker EGEE WMS
III. Conclusions • The introduced meta-brokering approach opens a new way for Interoperability support • The design and the architecture of the Grid Meta-Broker enable a higher level resource management by utilizing resource brokers of different grid middleware systems • This service can act as a bridge among the separated islands of the current production Grids, therefore it solves Grid Interoperability at the level of resource management • We expect that with the integration of the Grid Meta-Broker to the portal, we will be able to enhance better application execution with a simplified and more interoperable service in the future.
Grid Interoperability levels 0. 1. 2. Portal 3. Portal 4. Meta- 5. Portal