230 likes | 388 Views
GridLab Resource Management System (GRMS). Jarek Nabrzyski GridLab Project Coordinator. naber@man.poznan.pl office@gridlab.org. Poznań Supercomputing and Networking Center. GridLab. EU funded project, involving 11 European and 3 American partners (Globus and Condor teams),
E N D
GridLab Resource Management System(GRMS) Jarek Nabrzyski GridLab Project Coordinator naber@man.poznan.pl office@gridlab.org Poznań Supercomputing and Networking Center
GridLab • EU funded project, involving 11 European and 3 American partners (Globus and Condor teams), • January 2002 – December 2004 • Main goal: to develop a Grid Application Toolkit (GAT) and set of grid services and tools... • resource management (GRMS), • data management, • monitoring, • adaptive components, • mobile user support, • security services, • portals, ... and test them on a real testbed with real applications GGF7, Tokyo, March 4-7, 2003
GridLab Members • PSNC (Poznan) - coordination • AEI (Potsdam) • ZIB (Berlin) • Univ. of Lecce • Cardiff University • Vrije Univ. (Amsterdam) • SZTAKI (Budapest) • Masaryk Univ. (Brno) • NTUA (Athens) • Sun Microsystems • Compaq (HP) • ANL (Chicago, I. Foster) • ISI (LA, C.Kesselman) • UoWisconsin (M. Livny) • collaborating with: • Users! • EU Astrophysics Network, • DFN TiKSL/GriKSL • NSF ASC Project • other Grid projects • Globus, Condor, • GrADS, • PROGRESS, • GriPhyn/iVDGL, • CrossGrid and all the other European Grid Projects (GRIDSTART) • other... GGF7, Tokyo, March 4-7, 2003
GridLab Applications Cactus (www.cactuscode.org) Triana (www.triana.co.uk) GGF7, Tokyo, March 4-7, 2003
What our users want... • Two primary applications: Cactus and Triana • other application communities are also being engaged, • Application oriented environment • Resources (grid) on demand • Adaptive applications – adaptive grid environment • job checkpoint, migration, spawn off a new job when needed, • Open, pervasive, not even restricted to a single Virtual Organization • The ability to work in a disconnected environment • start my job on a disconnected laptop; migrate it to grid when it becomes available • from laptops to fully deployed Virtual Organisations • Mobile working • Security GGF7, Tokyo, March 4-7, 2003
What our users want... (cont.) • The infrastructure must provide capabilities to customise choice of service implementation (e.g.using efficiency, reliability, first succeeding, all) • Advance reservation of resources, • To be able to express their preferences regarding their jobs on one hand and to understand the resource policies on the other hand, • Policy information and negotiation mechanisms • what is a policy of usage of the remote resources? • Prediction-based information • How long will my job run on a particular resource? • What resources do I need to complete the job before deadline? GGF7, Tokyo, March 4-7, 2003
The Grid is complex … Application “Is there a better resource I could be using?” SOAP WSDL Corba OGSA Other Monitoring Profiling Information Logging Security Notification Resource Management Application Manager Migration Data Management GLOBUS Other Grid Infrastructure GGF7, Tokyo, March 4-7, 2003
…need to make it easier to use Application “Is there a better resource I could be using?” GAT_FindResource( ) GAT The Grid GGF7, Tokyo, March 4-7, 2003
The Same Application … Laptop Super Computer The Grid Application Application Application GAT GAT GAT Firewall issues! No network! GGF7, Tokyo, March 4-7, 2003
GAT: What is It? GAT: Grid Application Toolkit • Implements the GAT-API • Used by applications (different languages) • GAT Adaptors • Connect to capabilities/services • GAT Engine • Provides the function bindings for the GAT-API GGF7, Tokyo, March 4-7, 2003
Grid Application Toolkit • The GATprovides functionality through a carefully constructed set of generic high-level APIs, through which an application will be able to call the underlying gridservices, • Set of application developer APIs for Grid tools, services and software libraries, (and example implementations) that support the development of grid-enabled applications (open source!) • Usable from any high level “application” (any generic code, Cactus, Triana, Portals, Scripts, …) GGF7, Tokyo, March 4-7, 2003
GAT • More or less … • Set of calls GAT_ToolOrService(arguments) • Your chosen tools/services: resource brokers, information servers, application managers, grid monitoring, data managers, notification, etc. • Set of APIs for dealing with the GAT (registration, information, errors, fault tolerance) GGF7, Tokyo, March 4-7, 2003
GridLab Architecture GGF7, Tokyo, March 4-7, 2003
GridLab RMS approach • Grid resources are not only the machines, but also databases, files, users, administrators, instruments, mobile devices, jobs/applications ... • Many metrics for scheduling: throughput, cost, latency, deadline, other time and cost metrics... • Grid resource management consists of job/resource scheduling, security (authorization services,...), local policies, negotiations, accounting, ... • GRM is both, user and resource owner driven negotiation process and thus, multicriteria decision making process GGF7, Tokyo, March 4-7, 2003
GRMS Policy/ Configuration Services • System Configuration Mgmt • System Policy Mgmt Core GRMS Services • Job Receiver Service • Resource Discovery • Resource Evaluation • Brokering • Prediction Service • QoS/SLA Service • Advanced Reservation • Resource Estimation Job Execution Service • Job/Application Mgmt • Scheduler • Distributed Workflow Infrastructure Services • Job logging and tracking service • Security service (WP6) • System Monitoring (WP11) • Grid Information System (WP10) • Adaptive Services (WP7) • Data mgmt services (WP8) GRMS is a bag of services GGF7, Tokyo, March 4-7, 2003
GridLab RMS Information Services Data Management Authorization System Adaptive Resource Discovery File Transfer Unit Jobs Queue BROKER Job Receiver Execution Unit Monitoring SLA Negotiation Scheduler Workflow Manager Application Manager Resource Reservation Prediction Unit GRMS GLOBUS, other Local Resources (Managers) GGF7, Tokyo, March 4-7, 2003
GRMS and SLA GGF7, Tokyo, March 4-7, 2003
GRMS and SLA (cont.) GGF7, Tokyo, March 4-7, 2003
Research focus of GRMS • Focus on the infrastructure is not enough for the efficient GRM • Focus on policies • Focus on multicriteria aspects of the GRM • users, their preferences and applications • resource owners’ preferences • preference models, multicriteria decision making, knowledge will be crucial for efficient resource management • Focus on AI techniques for GRM • Focus on business models, economy grids • Cost negotiation mechanisms could be part of the SLA negotiation process contradictory in nature GGF7, Tokyo, March 4-7, 2003
Multicriteria RM in GridLab • Gathering of information • apps requirements (resource requirements, environment, etc.) • user preferences (which criteria and how important) • user support, preference modeling tools, • Selection phase • choose the best resources (schedule) based on the information provided and on the resource availability (estimates, predictions) • from simple matchmaking to multiple optimisation techniques • Execution phase • file staging, execution control, job monitoring, migration, usually re-selection of resources, application adaptation (application managers, adaptive services from GridLab) GGF7, Tokyo, March 4-7, 2003
Policy representation • Local agents responsible for the policy information gethering • Interface for the GRMS (VO) policy configuration • Local queue configurations and global VO policies are represented in the form of rules which can be read by the GRMS GGF7, Tokyo, March 4-7, 2003
Current implementation • Runs at rage1.man.poznan.pl, which is the front-end to the linux cluster and uses tomcat and axis as a hosting environment • httpg://rage1.man.poznan.pl:8443/axis/services/gsiScenarioBroker • The WSDL document and the code of client is available on thepage:http://www.gridlab.org/WorkPackages/wp-9/ in the section: Resources/Our Software GGF7, Tokyo, March 4-7, 2003
Current implementation • submitJob - submits new job, • migrateJob - migrates existing job, • getMyJobsList - returns list of jobs belonging to the user, • registerApplicationAccess - registers application access, • getJobStatus - returns GRMS status of the job, • getHostName - returns host name, on which the job is/was running • getJobInfo - returns a structure describing the job, • findResources - returns resources matching user's requirements, • cancelJob - cancels the job, • getServiceDescription - returns description of a service. GGF7, Tokyo, March 4-7, 2003