1 / 43

Information Modeling and Distribution in Grid Systems

Information Modeling and Distribution in Grid Systems. 23 Nov 2004, Ferrara. Sergio Andreozzi INFN-CNAF Bologna (Italy) sergio.andreozzi@cnaf.infn.it Ferrara, 22 Nov 2005. OUTLINE. Problem Statement Information Modeling of Grid resources GLUE Schema Computing Resources

osric
Download Presentation

Information Modeling and Distribution in Grid Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Modeling and Distribution in Grid Systems 23 Nov 2004, Ferrara Sergio Andreozzi INFN-CNAF Bologna (Italy) sergio.andreozzi@cnaf.infn.it Ferrara, 22 Nov 2005

  2. OUTLINE • Problem Statement • Information Modeling of Grid resources • GLUE Schema • Computing Resources • Storage Resources • Network Resources • Common Information Model (CIM) • Information Distribution • Grid Information Service Ferrara, 23 Nov 2004

  3. PART I Problem Statement Ferrara, 23 Nov 2004

  4. Grid: basic principles Grid systems allow to: • Share resources across administrative domains (e.g., computing power, storage space, database) • Shared resources are • geographically dispersed • heterogeneous • belong to different administrative domains • dynamic composition • can be remotely accessed by users Ferrara, 23 Nov 2004

  5. Site A Site B Grid: basic principles • Virtualization of users and resources • mapping from virtual resources to physical • mapping from virtual users to physical users Grid system [ 1 ] Ferrara, 23 Nov 2004

  6. Problem Statement Information Modeling • Resources available in Grid systems must be described in a precise and systematicmanner if they are to be able to be discovered for subsequent management or use • A shared description allows multiple experts to contribute to the problem and serves as a communication mean between different knowledge domains Information Modeling of Grid resources Ferrara, 23 Nov 2004

  7. Problem Statement Information Distribution • A Grid system requires the ability to efficiently access and manipulate information about applications, resources and services • The service must deal with distributed sources of information and enable distributed access to them Grid Information Service Ferrara, 23 Nov 2004

  8. PART II Information Modeling and the GLUE Schema Ferrara, 23 Nov 2004

  9. Information Model: definition • Abstraction of real world into constructs that can be represented in computer systems (e.g., objects, properties, behavior, and relationships) • Not tied to any particular implementation • Used to exchange information among different domains Ferrara, 23 Nov 2004

  10. Problem Statement Information Modeling • Main Use Cases: • Discovery for brokering and access: • “what are the Computing Elements available to the VO CMS and that offer a certain operating system with installed a particular software package?” • “what are the Storage Elements that offer 20 gigabytes of disk space for the VO ATLAS?” • Discovery for monitoring • “how many CPUs the site XYZ is offering to the EGEE Grid?” • “what is the success rate of job submitted per site?” Ferrara, 23 Nov 2004

  11. Information Model:how can be represented • Typically, graphical languages are preferred • Several solutions are available • We have selected the Unified Modeling Language (UML) • It is a widely accepted international standard (Object Management Group, OMG) • It is often used for information and conceptual modeling • It has become well established in many communities with extensive tool support from both commercial and open source vendors Ferrara, 23 Nov 2004

  12. Unified Modeling Language (UML) • The Unified Modeling Language (UML) is a graphical language for visualizing, specifying, constructing, and documenting the artifacts of a software-intensive system. • The UML offers a standard way to write a system's blueprints, including conceptual things such as business processes and system functions as well as concrete things such as programming language statements, database schemas, and reusable software components. (Object Management Group) Ferrara, 23 Nov 2004

  13. Unified Modeling Language • First Specification in 1997 • Current Specification version 2.0 (13 different diagrams) • Diagram groups: • six structure diagrams: show the static structure of the objects in a system. • three behavior diagrams: show the dynamic behavior of the objects in a system, including their methods, collaborations, activities, and state histories. • four interaction diagrams • Each diagram type has: • Semantics: what does the diagram type do? • Notation: what graphical symbols can the diagram type contain? • We use Class diagrams: they show the static structure of the model, in particular, the things that exist (such as classes and types), their internal structure, and their relationships to other things Ferrara, 23 Nov 2004

  14. UML 2.0 Diagrams Ferrara, 23 Nov 2004

  15. UML Class Diagram elements • Class represents a concept within the system being modeled. It has data structure, behavior and relationships to other elements • Generalization: taxonomic relationship between a more general element (the parent) and a more specific element (the child) that is fully consistent with the first element and that adds additional information. It is used for classes, packages, use cases, and other elements Ferrara, 23 Nov 2004

  16. UML Class Diagram elements • Binary association: an association among exactly two classes (maybe also from a class symbol to itself) • Aggregation: it denotes weak ownership (i.e., the part may be included in several aggregates) and its owner may also change over time. Deleting the aggregate referencing does not imply deletion of the parts • Composition: strong form of aggregation; a part instance may be included in at most one composite at a time; the composite object has sole responsibility for the disposition of its parts Ferrara, 23 Nov 2004

  17. GLUE Schema [ 2 ] • approach to the information modeling of Grid resources started in April 2002 among Grid-related High-Energy Physics projects • Now at version 1.2 GLUE Schema (Relational) R-GMA DataGrid Schema (LDAP) GLUE Schema (UML) GLUE Schema (XML) GT MDS 4 Globus Schema (LDAP) GLUE Schema (LDAP) GT MDS 2 Ferrara, 23 Nov 2004

  18. GLUE Schema:modeling guidelines • Focus on the virtual abstraction given by the Grid paradigm • Virtual pool of resources • Generalization • capture common aspects for different entities providing the same functionality (e.g., uniform view over different batch services) • Deal with both monitoring needs and discovery needs • Monitoring: concerns those attributes that are meaningful to describe the status of resources (e.g., useful to detect fault situation) • Discovery: concerns those attributes that are meaningful for locate resources on the base of a set of preferences/constraints (e.g., useful during matchmaking process) Ferrara, 23 Nov 2004

  19. Core Schema Ferrara, 23 Nov 2004

  20. GLUE Computing resources:warm up • What is the core offered functionality? • Computing power • What I need to know in order to use it? • Offered execution environment (e.g., OS type, available software libraries) • Offered Quality of Service (e.g., estimated response time) • Status (e.g., number of running jobs) • Policy (e.g., max execution time, assigned CPUs) • Access rights (e.g., can I use it?) • Location (e.g., Uniform Resource Locator or URL) Ferrara, 23 Nov 2004

  21. GLUE Computing resources:some more thought about the service • The computing power is typically offered by cluster systems • Requests are typically staged into queues for efficient system usage • Queue policies enable service differentiation (e.g., dedicated CPUs vs. shared CPUs assignment, differentiated max CPU time, differentiated queue service strategy) • A service has quality aspects Ferrara, 23 Nov 2004

  22. Ferrara, 23 Nov 2004

  23. GLUE Storage resources:warm up • What is the core offered functionality? • Storage Space usage • What I need to know in order to use it? • Storage Service manager type (e.g., srmv2) • Available data access protocols (e.g., gridftp, rfio) • Offered Quality of Service (e.g., availability, reliability) • State (e.g., available space) • Policy (e.g., file life time, MaxFileSize) • Access rights (e.g., can I use it?) • Location (e.g., Uniform Resource Locator or URL) Ferrara, 23 Nov 2004

  24. GLUE Storage Element • Storage resources contributed to a Grid system can vary from simple disk servers to complex massive storage systems • Storage Element: • Abstraction for a storage resource. Group of services, protocols and data sources • Protocols can be for data access/transfer or management Ferrara, 23 Nov 2004

  25. GLUEStorageSpace • Storage Space: portion of a logical storage extent that: • is assigned to Grid users (e.g., a VO, a group of a VO) • is associated to a directory of the underlying file system (e.g. /permanent/CMS) • has a set of policies (MaxFileSize, MinFileSize, MaxData, MaxNumFiles, MaxPinDuration, Quota, ACL) • has a state (available space, used space) Ferrara, 23 Nov 2004

  26. Ferrara, 23 Nov 2004

  27. Expressing relationships amongComputing and Storage Services • A typical job execution request involves certain properties for the computing element and for a permanent storage area • SiteAdmins may want to specify preferences on which Storage Areas should be used by jobs executed by certain computing elements • Possible mount point information and weight for choosing among different opportunities are provided Ferrara, 23 Nov 2004

  28. Common Information Model [ 9 ] • CIM: Common Information Model • Conceptual view of the managed environment for IT resources that attempts to unify and extend the existing instrumentation and management standards • Targeted at management of resources, where management is defined as the active process of monitoring, modifying, and making decisions about a resource • Maintained by Distributed Management Task Force (DMTF), a worldwide industry organization • It uses UML Class Diagram as a modeling language Ferrara, 23 Nov 2004

  29. CIM and the Grid community • There have been several activities for extending CIM as regards Grid requirements • There is some intersection, but also some difference between CIM and GLUE as regards the Grid use case • Recently, there is a work trying to integrate the GLUE Schema concepts in an experimental extension of CIM Ferrara, 23 Nov 2004

  30. PART III Grid Information Service Ferrara, 23 Nov 2004

  31. Grid Information Service • A Grid system requires the ability to efficiently access and manipulate information about applications, resources and services • The service must deal with distributed sources of information and enable distributed access to them Ferrara, 23 Nov 2004

  32. Characterization • Depends greatly on factors such as: • Relation to time: static vs. dynamic information • Purpose: discovery, logging, monitoring • Common patterns: • Producers • Consumers • Intermediary Ferrara, 23 Nov 2004

  33. Delivery Options Ferrara, 23 Nov 2004

  34. Grid Information Services: overview [ 3 ] [ 4 ] [ 5 ] Ferrara, 23 Nov 2004

  35. MDS2-based Information Serviceexample Ferrara, 23 Nov 2004

  36. PART IV Exercises Ferrara, 23 Nov 2004

  37. Exercises using the EGEE Grid Ferrara, 23 Nov 2004

  38. Information Service • Based on MDS 2.x • Uses LDAP • For the demo I set up: • http://www.cnaf.infn.it/~andreozzi/ldap/ • For inputs in an ldap query (beside the URL and Auth): • Base dn, Scope, Filter, Attributes • Other LDAP Browsers: • (java) http://www-unix.mcs.anl.gov/~gawor/ldap/ • (win) http://www.softerra.com/download/download.php Ferrara, 23 Nov 2004

  39. Exercises • You can run these exercises from a linux shell • You need the ldapsearch command to be available • In order to query an LDAP server, you need: • Hostname of the maching running the server • Port • Authentication credentials (anonymous for our use case) • Base DN: the root tree from which to start the query • When you see LQ in the queries, you have to substitute it with: • ldapsearch –h gridit-bdii-01.cnaf.infn.it –p 2170 –x –b “mds-vo-name=local,o=grid” • The given hostname is the top-level root of the INFN Grid Ferrara, 23 Nov 2004

  40. Exercises • Ex.1: List all Grid Sites • LQ ‘objectclass=GlueSite’ • Ex.2: Count the number of sites • LQ ‘objectclass=GlueSite’ GlueSiteUniqueID | wc -l • Ex.3: List all CE’s • LQ ‘objectclass=GlueCE’ • EX.4: list all CE’s with running jobs • LQ ‘(&(objectclass=GlueCE)(GlueCEStateRunningJobs!=0))’ • EX.5: list all SE’s • LQ ‘objectclass=GlueSE’ Ferrara, 23 Nov 2004

  41. Conclusion • Information Modeling of Grid resources • Characteristics of Grid systems require a shared information model of resources to be used as a base for the Information Service • An important approach to the information modeling of Grid resources has been presented • Grid Information Service • A vital service for Grid systems • Many approaches exist, general or tailored to particular solutions Ferrara, 23 Nov 2004

  42. REFERENCES [1] Németh Z, Sunderam, V. Characterizing Grids: Attributes, Definitions, and Formalisms, Journal of Grid Computing, 2003, volume 1, number 1, pages 9-23 http://ipsapp009.kluweronline.com/IPS/frames/toc.aspx?J=6160&I=1# [2] GLUE Schema Official documents http://infnforge.cnaf.infn.it/glueinfomodel [3] Globus Toolkit – Monitoring and Discovery Service 2 http://www.globus.org/mds/mds2/ [4] Globus Toolkit – Monitoring and Discovery Service 4 http://www-unix.globus.org/toolkit/docs/development/4.0-drafts/info/WSMDSFacts.html [5] R-GMA – Relational Grid Monitoring Service http://www.r-gma.org Ferrara, 23 Nov 2004

  43. REFERENCES [6] S. Andreozzi, GLUE Schema implementation for the LDAP model, INFN Technical report, http://www.cnaf.infn.it/~sergio/publications/Glue4LDAP.pdf [7] K. Czajkowskiy, S. Fitzgeraldz, I. Foster, and C. Kesselman. Grid Information Services for Distributed Resource Sharing. In Proceedings of 10th IEEE International Symposium on High-Performance Distributed Computing (HPDC-10) http://www.globus.org/research/papers.html#MDS-HPDC [8] M. Franklin, S. Zdonik, “Data In Your Face”: Push Technology in Perspective, ACM SIGMOD ’98, Seattle, WA, USA [9] DMTF Common Information Model http://www.dmtf.org/standards/cim/ Ferrara, 23 Nov 2004

More Related