550 likes | 663 Views
Searching within Large Grid Infrastructures. Marios D. Dikaiakos University of Cyprus & CoreGRID. Acknowledgements. Wei Xing, University of Cyprus Rizos Sakellariou, U. Manchester, UK Yannis Ioannidis, U. Athens, GR Salvatore Orlando, ISTI-CNR, IT Domenico Laforenza, ISTI-CNR, IT. Outline.
E N D
Searching within Large Grid Infrastructures Marios D. Dikaiakos University of Cyprus & CoreGRID
Acknowledgements • Wei Xing, University of Cyprus • Rizos Sakellariou, U. Manchester, UK • Yannis Ioannidis, U. Athens, GR • Salvatore Orlando, ISTI-CNR, IT • Domenico Laforenza, ISTI-CNR, IT
Outline • Context and Motivation • Limitations of Grid Information Services • Semantic Grid and Ontologies • A Core Grid Ontology • Conclusions and Future Work
The Grid • A wide-scale, distributed computing infrastructure to support resource sharing and coordinated problem solving in dynamic, multi-institutional Virtual Organizations. • Computational Grid:Provides the raw computing power, high speed bandwidth interconnection and associate data storage. • Data & Information Grid:Allows easily accessible connections to major sources of information and tools for its analysis and visualisation. • Knowledge & Semantic grid:Gives added value to the information; provides intelligent guidance for decision-makers; facilitates the generation, diffusion and support of knowledge.
Near-future Scenarios for the Grid • The Grid as a Wide-Scale Distributed System: • Millions of resources of different kinds. • Services and Policies in place. • Relationships (permanent and transient) between organizations, software, data, services, applications… • Different middleware platforms. • Common (?) protocols, standards and API’s. • The hope is that Grid will grow larger and will reach an acceptance as wide as the Web.
Problem Statement: Searching the Grid • How are individuals and organizations going to harness the capabilities of a fully deployed Grid, with a massive and ever-expanding base of computing and storage nodes, networkresources, and a huge corpus of available programs, services, and data? • To this end, users need to identify “resources” that are: • Interesting (discovery) • Relevant (classification) • Accessibleandavailableunder knownpolicies of use, cost(inquiry) • Emphasis on “summary” information, in terms of granularity and timing.
Searching the Grid • Software and Data-sets • Policies • Relationships • Best-practices • Computing, Storage, Network Resources
Examples of search queries • Hardware resources on the Grid, their attributes, and applicable policies of their use: • Find a VO providing exclusive access to a shared-memory multiprocessor system with at least 16 processors, 8 GB of main memory, and a usage charge of not more than 100 euros per CPU time? • Application services, software, and data-sets: • Find services running Quantum Chromo-Dynamics calculations (QCD) using F90 and MPI. • Hardware-software combinations, Grid usage and best-practices: • Find the pricing and prior clientele of Grid services that provide access to the XYZ workflow for high-performance oil refinery simulations.
Outline • Context and Motivation • Grid Information Services and Limitations • Semantic Grid and Ontologies • A Core Grid Ontology • Conclusions and Future Work
Grid Information Services • Established to help users answer questions on the status of individual resources and the Grid. • Support the discovery and ongoing monitoring of the existence and characteristics of resources, services, computations and other entities of value to the Grid. • Examples: • GLOBUS, EDG:Metacomputing Directory Service (MDS) • UNICORE GatewayandNetwork Job Supervisor (NJS) • EGEE: Relational Grid Monitoring Architecture (R-GMA), GridICE • Condor Matchmaker
GRIS LDIF “Info. Providers” MDS: Grid Info Services in Globus Users GRIP GIIS GRIP GRRP GRRP Discovery/ Inquiry/ Retrieval GIIS GIIS GIIS GRIP GRRP GRRP GRRP GRRP GRIS GRIS GRIS Info. Retrieval LDIF LDIF LDIF “Info. Provider” “Info. Providers” “Info. Providers” Resources
Producer Servlet Relational Grid Monitoring Architecture Application Consumer Servlet Consumer API Registry Service Registry API Producer API Sensor Code
What information is out there? • Virtual Organizations: • Resources • Policies • People • Resource Specifications: • Descriptions & Types • Names • Capacity • Configuration • Resource status • Resource use. • Availability. • Monitoring data. • Summary & Statistics • Logs. • Associations. • Statistics of use. • Applications: • Descriptions. • I/O requirements. • Meta-Data • Worklfows • Software: • Codes • Specs • Location • Data-sets: • Data • Metadata • Replicas • Services: • Interface • Metadata
Limitations of Current Approaches • Remarks extracted from the description of a Grid-application development effort: • “Jobs typically need to access hundreds of files, and each site has a different subset of the files.” • “Our data system knows what portion of a user's data may be at each site, but doesnot know how to submit grid jobs.” • “Our job submission system required users to choose grid sites and gave them no assistance in choosing.” • “…jobs requesting thousands of files and sites having hundreds of thousands of files are not uncommon in production.” • “…it would not be scalable to explicitly publish all the properties of jobs and resources in ...”
Limitations and Challenges • Scalability in the context of Millions of Resources: • Infrastructure intrusiveness. • Resource Discovery, Retrieval and Classification. • Expressiveness of Data Models in terms of: • Types of captured information. • Expressingsemantic relationships between represented entities. • Amenability to Indexing, Query Optimization. • Complexity: • Different protocols for discovery & inquiry, registration, invocation. • Lack of interoperability between different platforms. • Information Standardization. • Missing Functionalities: • Transient and Historical information. • Policies. • Complex Queries.
Revisiting the problem • Very large number of sources. • Independent. • No common schema. • Various, partly unknown semantics. • Subject to change, birth, or silence.
Revisiting the problem • A federatedwarehouse approach: • “Wrap” the various sources to extract their information. • Store data in a warehouse. • Monitor sources and propagate updates to the warehouse. • Ask queries to the warehouse.
Requirements for Searching the Grid • Global/Commonnamingscheme for Grid entities. • Resolution mechanism fordiscovery and retrieval of entity-related information/meta-data. • Typeandrepresentation of retrieved entity-related information. • Mining and representation of relationships and summary data. • Complexity of queries and query interpretation.
Research Issues • Metadata Consolidation: • Definition & local creation of metadata about Grid entities. • Information Source Discovery: • Algorithms for Search and Discovery, Management of Updates. • Metadata Retrieval and Integration: • Protocols for retrieval; Data structures and algorithms for integration. • Management of meta-data: • Analysis to build proper indexes; Extrapolation of semantic relationships. • Query mechanisms and interface. • Query language definition. Intelligent-agent interface to help users formulate queries.
Outline • Context and Motivation • Limitations of Grid Information Services • Semantic Grid and Ontologies • A Core Grid Ontology • Conclusions and Future Work
Looking for answers: Semantic Grid An extension of the current Grid in which information and services are given well-defined and explicitly represented meaning, so that it can be shared and used by humans and machines, better enabling them to work in cooperation. Source: Goble, Bechhofer, DeRoure, Semantic Grid 101 GGF16, Athens, 2/2005
Ontologies and the Semantic Grid • Ontologies are among the key building blocks of the Semantic Grid. • The concepts/terms of Grid entities, resources, capabilities and the relationships between them. • We develop Grid ontologies to: • Merge the information from different sources; • Build a knowledge base for Grid infrastructures; • Construct a Grid information system; • Support co-operation with semantic-able Grid services, such as Resource Broker, Information Service, etc.
Ontologies in Computer Science • An ontology is anengineering artifact: • It is constituted by a specificvocabularyused to describe a certain reality, plus • a set of explicitassumptionsregarding the intended meaning of the vocabulary. • Almost always including how concepts should beclassified • Thus, an ontology describes a formal specification of a certain domain: • Shared understanding of a domain of interest • Formal and machine manipulable model of a domain of interest Source: Goble, Bechhofer, DeRoure, Semantic Grid 101 GGF16, Athens, 2/2005
OWL Inference RDF(S) Integration Integration RDF Annotation XML Languages • Work on Semantic Web has concentrated on the definition of a collection or “stack” of languages. • These languages are then used to support the representation and use of metadata. • The languages provide basic machinery that can be used to represent the extra semantic information needed for the Semantic Web • XML • RDF • RDF(S) • OWL • … Source: Goble, Bechhofer, DeRoure, Semantic Grid 101, GGF16, Athens, 2/2005
“W3C” Stack • XML provides a surface syntax for structured documents • XML Schema is a language for restricting the structure of XML documents. • RDF is a data-model for objects ("resources") and relations between them, provides simple semantics for this data-model • RDF Schema is a vocabulary for describing properties and classes of RDF resources, with semantics for generalization and hierarchies of such properties and classes. • OWL adds more vocabulary for describing properties and classes.
Outline • Context and Motivation • Limitations of Grid Information Services • Semantic Grid and Ontologies • A Core Grid Ontology • Conclusions and Future Work
Towards a general Ontology for Grids • Currently, there are several Grid architectures and Grid implementations. • Different views of Grid entities and their properties. • It is practically impossible thatoneontology can include all aspects of Grids or of many types of Grid entities. • A Core Grid Ontology (CGO): • A core “framework” for representing a Grid. • Open and extensible for all kinds of Grid architectures and Grid implementations.
GGF 16, 2/2006 Building a Core Ontology • The most difficult task for developing an ontology: • Capture a “right” model for the Grid; • Our view of a Grid: • Users&Applications+{Middleware/Services}+Resources within VOs; • A layer-structured model consisting of three layers: • Users/Applications • Middleware/services • Resources.
Defining properties Based on the Constraints of the CGO Classes.
Representing a Grid Entity using OWL <owl:Class rdf:ID="ComputingElement"> <rdfs:subClassOf> <owl:Restriction> <owl:someValuesFrom> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <owl:Class rdf:about="#Jobmanager"/> <owl:Class rdf:about="#JobScheduler"/> </owl:unionOf> </owl:Class> </owl:someValuesFrom> <owl:onProperty rdf:resource="#runningSevice"/> </owl:Restriction> </rdfs:subClassOf> ……
Conclusions • The CGO can be used as a common, extensiblelanguage for: • Expressing the basic concepts of a Grid infrastructure and the relationships thereof. • Encoding and storing Grid metadata. • Integrating grid-related information extracted from different sources. • Expressing queries.
Next steps • Automate the knowledge-base construction and maintenance process: • Information-source discovery • Metadata wrapping • Metadata integration • Consistency updates • Investigate mechanisms for efficient knowledge-base query implementation.
Thank you for your attention! • Questions? • Comments ?
References • "A Core Grid Ontology for the Semantic Grid." Wei Xing, M. D. Dikaiakos, and R. Sakellariou. 6th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), Singapore, May 2006 (to appear). • "Information Services for Large-scale Grids: A Case for a Grid Search Engine." M. D. Dikaiakos, R. Sakellariou, and Y. Ioannidis. In Engineering the Grid: status and perspectives, Jack Dongarra, Hans Zima, Adolfy Hoisie, Laurence Yang, Beniamino DiMartino (Editors), American Scientific Publishers, January 2006, ISBN: 1-58883-038-1. • "Building a Distributed Digital Library for Natural Disasters Metadata with Grid Services and RDF." W. Xing, M. D. Dikaiakos, Hua Yang, A. Sphyris, G. Eftychidis. Library Management Journal (Special Issue on Digital Libraries in the Knowledge Era: Knowledge Management and Semantic Web Technology). Vol. 26, No. 4-5, May 2005 • "Search Engines for the Grid: A Research Agenda." M. D. Dikaiakos, Y. Ioannidis, R. Sakellariou. In Grid Computing. First European AcrossGrids Conference, Santiago de Compostela, Spain, February 2003, Revised Papers,Lecture Notes in Computer Science series, vol. 2970, pages 49-58, vol. 2970, Springer, 2004.
The RDF Data Model • Statements are <subject, predicate, object> triples: • <Sean,hasColleague,Ian> • Can be represented as a graph: • Statements describe properties of resources • A resource is any object that can be pointed to by a URI: • The generic set of all names/addresses that are short strings that refer to resources • a document, a picture, a paragraph on the Web, http://www.cs.man.ac.uk/index.html, a book in the library, a real person (?), isbn://0141184280 • Properties themselves are also resources (URIs) hasColleague Sean Ian Source: Goble, Bechhofer, DeRoure, Semantic Grid 101 GGF16, Athens, 2/2005
Linking Statements • The subject of one statement can be the object of another • Such collections of statements form a directed, labeled graph • The object of a triple can also be a “literal” (a string) “Sean K. Bechhofer” hasName hasColleague Sean Ian hasHomePage hasColleague http://www.cs.man.ac.uk/~horrocks Carole
RDF Syntax • RDF has an XML syntax that has a specific meaning: • Every Description element describes a resource • Every attribute or nested element inside a Description is a property of that Resource • We can refer to resources by URIs <rdf:Description rdf:about="some.uri/person/sean_bechhofer"> <o:hasColleague resource="some.uri/person/ian_horrocks"/> <o:hasName rdf:datatype="&xsd;string">Sean K. Bechhofer</o:hasName> </rdf:Description> <rdf:Description rdf:about="some.uri/person/ian_horrocks"> <o:hasHomePage>http://www.cs.mam.ac.uk/~horrocks</o:hasHomePage> </rdf:Description> <rdf:Description rdf:about="some.uri/person/carole_goble"> <o:hasColleague resource="some.uri/person/ian_horrocks"/> </rdf:Description>
What does RDF give us? • A mechanism for annotating data and resources. • Single (simple) data model. • Syntactic consistency between names (URIs). • Low level integration of data. Source: Goble, Bechhofer, DeRoure, Semantic Grid 101 GGF16, Athens, 2/2005
RDF(S): RDF Schema • RDF gives a formalism for meta data annotation, and a way to write it down in XML, but it does not give any special meaning to vocabulary such as subClassOf or type (supporting OO-style modelling) • Interpretation is an arbitrary binary relation • RDF Schema extends RDF with a schema vocabulary that allows you to define basic vocabulary terms and the relations between those terms • Class, type, subClassOf, • Property, subPropertyOf, range, domain • it gives “extra meaning” to particular RDF predicates and resources • this “extra meaning”, or semantics, specifies how a term should be interpreted Source: Goble, Bechhofer, DeRoure, Semantic Grid 101 GGF16, Athens, 2/2005