430 likes | 522 Views
Information Modeling and Distribution in Grid Systems. 23 Nov 2004, Ferrara. Sergio Andreozzi INFN-CNAF Bologna (Italy) sergio.andreozzi@cnaf.infn.it Ferrara, 22 Nov 2005. OUTLINE. Problem Statement Information Modeling of Grid resources GLUE Schema Computing Resources
E N D
Information Modeling and Distribution in Grid Systems 23 Nov 2004, Ferrara Sergio Andreozzi INFN-CNAF Bologna (Italy) sergio.andreozzi@cnaf.infn.it Ferrara, 22 Nov 2005
OUTLINE • Problem Statement • Information Modeling of Grid resources • GLUE Schema • Computing Resources • Storage Resources • Network Resources • Common Information Model (CIM) • Information Distribution • Grid Information Service Ferrara, 23 Nov 2004
PART I Problem Statement Ferrara, 23 Nov 2004
Grid: basic principles Grid systems allow to: • Share resources across administrative domains (e.g., computing power, storage space, database) • Shared resources are • geographically dispersed • heterogeneous • belong to different administrative domains • dynamic composition • can be remotely accessed by users Ferrara, 23 Nov 2004
Site A Site B Grid: basic principles • Virtualization of users and resources • mapping from virtual resources to physical • mapping from virtual users to physical users Grid system [ 1 ] Ferrara, 23 Nov 2004
Problem Statement Information Modeling • Resources available in Grid systems must be described in a precise and systematicmanner if they are to be able to be discovered for subsequent management or use • A shared description allows multiple experts to contribute to the problem and serves as a communication mean between different knowledge domains Information Modeling of Grid resources Ferrara, 23 Nov 2004
Problem Statement Information Distribution • A Grid system requires the ability to efficiently access and manipulate information about applications, resources and services • The service must deal with distributed sources of information and enable distributed access to them Grid Information Service Ferrara, 23 Nov 2004
PART II Information Modeling and the GLUE Schema Ferrara, 23 Nov 2004
Information Model: definition • Abstraction of real world into constructs that can be represented in computer systems (e.g., objects, properties, behavior, and relationships) • Not tied to any particular implementation • Used to exchange information among different domains Ferrara, 23 Nov 2004
Problem Statement Information Modeling • Main Use Cases: • Discovery for brokering and access: • “what are the Computing Elements available to the VO CMS and that offer a certain operating system with installed a particular software package?” • “what are the Storage Elements that offer 20 gigabytes of disk space for the VO ATLAS?” • Discovery for monitoring • “how many CPUs the site XYZ is offering to the EGEE Grid?” • “what is the success rate of job submitted per site?” Ferrara, 23 Nov 2004
Information Model:how can be represented • Typically, graphical languages are preferred • Several solutions are available • We have selected the Unified Modeling Language (UML) • It is a widely accepted international standard (Object Management Group, OMG) • It is often used for information and conceptual modeling • It has become well established in many communities with extensive tool support from both commercial and open source vendors Ferrara, 23 Nov 2004
Unified Modeling Language (UML) • The Unified Modeling Language (UML) is a graphical language for visualizing, specifying, constructing, and documenting the artifacts of a software-intensive system. • The UML offers a standard way to write a system's blueprints, including conceptual things such as business processes and system functions as well as concrete things such as programming language statements, database schemas, and reusable software components. (Object Management Group) Ferrara, 23 Nov 2004
Unified Modeling Language • First Specification in 1997 • Current Specification version 2.0 (13 different diagrams) • Diagram groups: • six structure diagrams: show the static structure of the objects in a system. • three behavior diagrams: show the dynamic behavior of the objects in a system, including their methods, collaborations, activities, and state histories. • four interaction diagrams • Each diagram type has: • Semantics: what does the diagram type do? • Notation: what graphical symbols can the diagram type contain? • We use Class diagrams: they show the static structure of the model, in particular, the things that exist (such as classes and types), their internal structure, and their relationships to other things Ferrara, 23 Nov 2004
UML 2.0 Diagrams Ferrara, 23 Nov 2004
UML Class Diagram elements • Class represents a concept within the system being modeled. It has data structure, behavior and relationships to other elements • Generalization: taxonomic relationship between a more general element (the parent) and a more specific element (the child) that is fully consistent with the first element and that adds additional information. It is used for classes, packages, use cases, and other elements Ferrara, 23 Nov 2004
UML Class Diagram elements • Binary association: an association among exactly two classes (maybe also from a class symbol to itself) • Aggregation: it denotes weak ownership (i.e., the part may be included in several aggregates) and its owner may also change over time. Deleting the aggregate referencing does not imply deletion of the parts • Composition: strong form of aggregation; a part instance may be included in at most one composite at a time; the composite object has sole responsibility for the disposition of its parts Ferrara, 23 Nov 2004
GLUE Schema [ 2 ] • approach to the information modeling of Grid resources started in April 2002 among Grid-related High-Energy Physics projects • Now at version 1.2 GLUE Schema (Relational) R-GMA DataGrid Schema (LDAP) GLUE Schema (UML) GLUE Schema (XML) GT MDS 4 Globus Schema (LDAP) GLUE Schema (LDAP) GT MDS 2 Ferrara, 23 Nov 2004
GLUE Schema:modeling guidelines • Focus on the virtual abstraction given by the Grid paradigm • Virtual pool of resources • Generalization • capture common aspects for different entities providing the same functionality (e.g., uniform view over different batch services) • Deal with both monitoring needs and discovery needs • Monitoring: concerns those attributes that are meaningful to describe the status of resources (e.g., useful to detect fault situation) • Discovery: concerns those attributes that are meaningful for locate resources on the base of a set of preferences/constraints (e.g., useful during matchmaking process) Ferrara, 23 Nov 2004
Core Schema Ferrara, 23 Nov 2004
GLUE Computing resources:warm up • What is the core offered functionality? • Computing power • What I need to know in order to use it? • Offered execution environment (e.g., OS type, available software libraries) • Offered Quality of Service (e.g., estimated response time) • Status (e.g., number of running jobs) • Policy (e.g., max execution time, assigned CPUs) • Access rights (e.g., can I use it?) • Location (e.g., Uniform Resource Locator or URL) Ferrara, 23 Nov 2004
GLUE Computing resources:some more thought about the service • The computing power is typically offered by cluster systems • Requests are typically staged into queues for efficient system usage • Queue policies enable service differentiation (e.g., dedicated CPUs vs. shared CPUs assignment, differentiated max CPU time, differentiated queue service strategy) • A service has quality aspects Ferrara, 23 Nov 2004
GLUE Storage resources:warm up • What is the core offered functionality? • Storage Space usage • What I need to know in order to use it? • Storage Service manager type (e.g., srmv2) • Available data access protocols (e.g., gridftp, rfio) • Offered Quality of Service (e.g., availability, reliability) • State (e.g., available space) • Policy (e.g., file life time, MaxFileSize) • Access rights (e.g., can I use it?) • Location (e.g., Uniform Resource Locator or URL) Ferrara, 23 Nov 2004
GLUE Storage Element • Storage resources contributed to a Grid system can vary from simple disk servers to complex massive storage systems • Storage Element: • Abstraction for a storage resource. Group of services, protocols and data sources • Protocols can be for data access/transfer or management Ferrara, 23 Nov 2004
GLUEStorageSpace • Storage Space: portion of a logical storage extent that: • is assigned to Grid users (e.g., a VO, a group of a VO) • is associated to a directory of the underlying file system (e.g. /permanent/CMS) • has a set of policies (MaxFileSize, MinFileSize, MaxData, MaxNumFiles, MaxPinDuration, Quota, ACL) • has a state (available space, used space) Ferrara, 23 Nov 2004
Expressing relationships amongComputing and Storage Services • A typical job execution request involves certain properties for the computing element and for a permanent storage area • SiteAdmins may want to specify preferences on which Storage Areas should be used by jobs executed by certain computing elements • Possible mount point information and weight for choosing among different opportunities are provided Ferrara, 23 Nov 2004
Common Information Model [ 9 ] • CIM: Common Information Model • Conceptual view of the managed environment for IT resources that attempts to unify and extend the existing instrumentation and management standards • Targeted at management of resources, where management is defined as the active process of monitoring, modifying, and making decisions about a resource • Maintained by Distributed Management Task Force (DMTF), a worldwide industry organization • It uses UML Class Diagram as a modeling language Ferrara, 23 Nov 2004
CIM and the Grid community • There have been several activities for extending CIM as regards Grid requirements • There is some intersection, but also some difference between CIM and GLUE as regards the Grid use case • Recently, there is a work trying to integrate the GLUE Schema concepts in an experimental extension of CIM Ferrara, 23 Nov 2004
PART III Grid Information Service Ferrara, 23 Nov 2004
Grid Information Service • A Grid system requires the ability to efficiently access and manipulate information about applications, resources and services • The service must deal with distributed sources of information and enable distributed access to them Ferrara, 23 Nov 2004
Characterization • Depends greatly on factors such as: • Relation to time: static vs. dynamic information • Purpose: discovery, logging, monitoring • Common patterns: • Producers • Consumers • Intermediary Ferrara, 23 Nov 2004
Delivery Options Ferrara, 23 Nov 2004
Grid Information Services: overview [ 3 ] [ 4 ] [ 5 ] Ferrara, 23 Nov 2004
MDS2-based Information Serviceexample Ferrara, 23 Nov 2004
PART IV Exercises Ferrara, 23 Nov 2004
Exercises using the EGEE Grid Ferrara, 23 Nov 2004
Information Service • Based on MDS 2.x • Uses LDAP • For the demo I set up: • http://www.cnaf.infn.it/~andreozzi/ldap/ • For inputs in an ldap query (beside the URL and Auth): • Base dn, Scope, Filter, Attributes • Other LDAP Browsers: • (java) http://www-unix.mcs.anl.gov/~gawor/ldap/ • (win) http://www.softerra.com/download/download.php Ferrara, 23 Nov 2004
Exercises • You can run these exercises from a linux shell • You need the ldapsearch command to be available • In order to query an LDAP server, you need: • Hostname of the maching running the server • Port • Authentication credentials (anonymous for our use case) • Base DN: the root tree from which to start the query • When you see LQ in the queries, you have to substitute it with: • ldapsearch –h gridit-bdii-01.cnaf.infn.it –p 2170 –x –b “mds-vo-name=local,o=grid” • The given hostname is the top-level root of the INFN Grid Ferrara, 23 Nov 2004
Exercises • Ex.1: List all Grid Sites • LQ ‘objectclass=GlueSite’ • Ex.2: Count the number of sites • LQ ‘objectclass=GlueSite’ GlueSiteUniqueID | wc -l • Ex.3: List all CE’s • LQ ‘objectclass=GlueCE’ • EX.4: list all CE’s with running jobs • LQ ‘(&(objectclass=GlueCE)(GlueCEStateRunningJobs!=0))’ • EX.5: list all SE’s • LQ ‘objectclass=GlueSE’ Ferrara, 23 Nov 2004
Conclusion • Information Modeling of Grid resources • Characteristics of Grid systems require a shared information model of resources to be used as a base for the Information Service • An important approach to the information modeling of Grid resources has been presented • Grid Information Service • A vital service for Grid systems • Many approaches exist, general or tailored to particular solutions Ferrara, 23 Nov 2004
REFERENCES [1] Németh Z, Sunderam, V. Characterizing Grids: Attributes, Definitions, and Formalisms, Journal of Grid Computing, 2003, volume 1, number 1, pages 9-23 http://ipsapp009.kluweronline.com/IPS/frames/toc.aspx?J=6160&I=1# [2] GLUE Schema Official documents http://infnforge.cnaf.infn.it/glueinfomodel [3] Globus Toolkit – Monitoring and Discovery Service 2 http://www.globus.org/mds/mds2/ [4] Globus Toolkit – Monitoring and Discovery Service 4 http://www-unix.globus.org/toolkit/docs/development/4.0-drafts/info/WSMDSFacts.html [5] R-GMA – Relational Grid Monitoring Service http://www.r-gma.org Ferrara, 23 Nov 2004
REFERENCES [6] S. Andreozzi, GLUE Schema implementation for the LDAP model, INFN Technical report, http://www.cnaf.infn.it/~sergio/publications/Glue4LDAP.pdf [7] K. Czajkowskiy, S. Fitzgeraldz, I. Foster, and C. Kesselman. Grid Information Services for Distributed Resource Sharing. In Proceedings of 10th IEEE International Symposium on High-Performance Distributed Computing (HPDC-10) http://www.globus.org/research/papers.html#MDS-HPDC [8] M. Franklin, S. Zdonik, “Data In Your Face”: Push Technology in Perspective, ACM SIGMOD ’98, Seattle, WA, USA [9] DMTF Common Information Model http://www.dmtf.org/standards/cim/ Ferrara, 23 Nov 2004