610 likes | 802 Views
Semantic Grid. Introduction. www.ontogrid.eu. Oscar Corcho University of Manchester. Primer Taller en Grid Computing Universidad del Valle, Cali, Colombia January 2007. Outline. Background What is the Grid? Next-Generation Grids: SOKU But, what is...? Data and metadata
E N D
Semantic Grid.Introduction www.ontogrid.eu Oscar Corcho University of Manchester Primer Taller en Grid ComputingUniversidad del Valle, Cali, Colombia January 2007
Outline • Background • What is the Grid? • Next-Generation Grids: SOKU • But, what is...? • Data and metadata • The Semantic Grid • Agenda Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
A Grid Computing Timeline • Japan government funds: • Business Grid project • NAREGI project • UK e-Science program starts • DARPA funds Globus Toolkit & Legion • EU funds UNICORE project • US DoE pioneers grids for scientific research • NSF funds National Technology Grid • NASA starts Information Power Grid • Today: • Grid solutions are common for HPC • Grid-based business solutions are becoming common • Required technologies & standards are evolving US Grid Forum forms at SC ‘98 Grid Forums merge, form GGF European & AP Grid Forums I-Way: SuperComputing ‘95 OGSA-WG formed “Physiology” paper “Anatomy” paper OGSA v1.0 … Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Source: Hiro Kishimoto (GGF17 opening keynote)
What is a Grid? R2AD License Webserver Printer Database A grid is a system consisting of • Distributed but connected resources and • Software and/or hardware that provides and manages logically seamless access to those resources to meet desired objectives • Infrastructure that will enable “coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations” Handheld Server Supercomputer Workstation Cluster Data Center Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Adapted from Hiro Kishimoto (GGF17 opening keynote)
Virtual Organizations • Dynamic confederations organized around common goals • Diverse membership & capabilities • People, compute resources, data resources, etc. • Diverse geographic distribution • Sharing is well-controlled • Minimumknowledge about physical characteristics of resources • Construction of higher level capabilities via composition of existing ones similar to SOA ° From http://www.globus.org Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Grid & Related Paradigms • Utility Computing • Computing “services” • No knowledge of provider • Enabled by grid technology • Cluster • Tightly coupled • Homogeneous • Cooperative working • Distributed Computing • Loosely coupled • Heterogeneous • Single Administration • Grid Computing • Large scale • Cross-organizational • Geographical distribution • Distributed Management Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Source: Hiro Kishimoto (GGF17 opening keynote)
Health and Safety Notice ACRONYM SPILL!! Disclaimer: Talking about Grid does not necessarily mean High Performance Computing and Parallelisation, but mainly management of distributed systems Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Open Grid Service Architecture - OGSA queryProperties Total capacity Get/ set Properties create Used space destroy Available space rewind stop WS-RF WS-Management WS-I+ WS-GAF Service Resource • Cross cutting requirements • Interoperable • VO level • Optimized • Reliable • Certain QoS Guarantee • Scalable • Available • Extensible • Characteristics • Service Orientation • Management operations • Resource Representation/ State • Lifetime Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
The Open Grid Services Architecture • An open, service-oriented architecture (SOA) • Resources as first-class entities • Dynamic service/resource creation and destruction • Built on a Web services infrastructure • Resource virtualization at the core • Build grids from small number of standards-based components • Replaceable, coarse-grained • e.g. brokers • Customizable • Support for dynamic, domain-specific content… • …within the same standardized framework Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Hiro Kishimoto: Keynote GGF17
Why Use a SOA? • Logical view of capabilities • Relatively coarse-grained functions • Reusable and composable behaviors • Encapsulation of complex operations • Naturally extendable framework • Platform-neutral • machine and OS Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Hiro Kishimoto: Keynote GGF17
SOA & Web Services: Key Benefits SOA • Flexible • Locate services on any server • Relocate as necessary • Prospective clients find services using registries • Scalable • Add & remove services as demand varies • Replaceable • Update implementations without disruption to users • Fault-tolerant • On failure, clients query registry for alternate services Web Services • Interoperable • Growing number of industry standards • Strong industry support • Reduce time-to-value • Harness robust development tools for Web services • Decrease learning & implementation time • Embrace and extend • Leverage effort in developing and driving consensus on standards • Focus limited resources on augmenting & adding standards as needed Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Hiro Kishimoto: Keynote GGF17
Open Grid Service Architecture - OGSA Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Execution Management 3. Select from or deployrequired resources 1. Describe the job CDL JSDL 4. Manage the job 2. Submit the job • The basic problem • Execute and manage jobs/services in the grid • Select from or provision required resources Job Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Hiro Kishimoto: Keynote GGF17
Data Services Use cases Issues Access Manage Data Data Find Describe Data Move/Copy/Replicate Data Data Metadata Commonaccess Data Data Protocols Formats Sensor Relational database Data stream Text file Catalog Derived data The basic problem • Manage, transfer and access distributed data services and resources Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Hiro Kishimoto: Keynote GGF17
Resource Management • Provides a framework to integrate resource management functions • interfaces, services, information models, etc. • Enables integrated discovery, monitoring, control, etc. Application- specific Domain-specific capabilities OGSA High-level management services (GGF) Execution Management services Data services Security services WSDM, WS-Management Access to manageability (OASIS, DMTF) WSRF/WSN, WS-Transfer/Eventing Resources Information models (DMTF,SNIA, etc.) Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Hiro Kishimoto: Keynote GGF17
Information Services Servicediscovery Loadbalancing Problemdetermination Executionmanagement Accounting Resourcereservation Applicationmonitoring Provide management and access facilities for information about applications and resources in the grid environment InformationServices Registry Asynchronous notification Consumers Producers Retrieval • Reliable • Secure • Efficient Logger Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Hiro Kishimoto: Keynote GGF17
Grid Reality • Requires experts to install, configure and maintain • Not near the ambitious OGSA landscape in terms of cross-cut requirements • Heavy use of XML “Virtual Homogeneity” OGSA - Vision Realizations OGF . . . . . . . . Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Grid Middleware. Globus Toolkit™ • A software toolkit addressing key technical problems in the development of Grid enabled tools, services, and applications • Offer a modular “bag of technologies” • Enable incremental development of grid-enabled tools and applications • Implement standard Grid protocols and APIs • Available under open source license • Used as a gateway to other resources • http://www.globus.org/ Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Grid Middleware. Globus Toolkit™. Four Key Protocols • The Globus Toolkit™ centers around four key protocols • Connectivity layer: • Security: Control access but allow collaboration • Resource layer: • Resource Management: Grid Resource Allocation Management (WS-GRAM) • Information: Information Index • Data Transfer: Grid File Transfer Protocol (GridFTP) Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Grid Middleware. Condor • Designed as a cycle-stealing middleware • Uses idle resource time to perform tasks • Converts collections of computers into clusters • If user takes back control of a resource then Condor job will either migrate or terminate • Provides reliable job completion • Re-run jobs that didn’t complete • Selects best resource for job based on requirements • Uses ClassAd Matchmaking to make sure that everyone is happy. • http://www.cs.wisc.edu/condor/ Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Grid Middleware. gLite • Combines much of the other two architectures (Globus, Condor) • Along with other functionality • Brokering service (WMS) • Data Storage (SE) • Deployed over a vast range of sites • Based in Europe • But spreading fast • http://www.eu-egee.org/ Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Outline • Background • What is the Grid? • Next-Generation Grids: SOKU • But, what is...? • Data and metadata • The Semantic Grid • Agenda Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
“To realise the Next Generation Grid requires semantically rich information representation, the exploitation of knowledge, and co-ordination and orchestration that is aware of context and task” David Snelling, NextGRID, Fujitsu, OGF Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Motivation (II) Organisations that manage large datasets have to find agreements on what terms mean Data versus metadata: we need bindings between the data and the data structure Well-typed workflows can be annotated with semantic types Kepler can use keyword-based or ontology-based search Data, metadata an ontology (NSF report) Provenance in Taverna is stored in RDF and OWL Workflow reuse Making this change in the code would change the [implicit] semantics of this Globus service Malcolm Atkinson UK eScience envoy Amarnath Gupta San Diego Supercomputing Centre Stuart Owen myGrid Lisa Childers Globus Toolkit Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Its called metadata. Or vocabularies. Or glossaries. It’s the state properties of a resource. Its in information services. And registries and catalogues. And configuration files. And policy definitions. And service level agreements. And file names. And file headers. And directory naming conventions And code libraries. And type systems. And schemas. And applications. And data formats. And best practice. And documentation. And workflows. And notification events And monitoring logs And embedded in XML tags … And even ontologies! And protocols. And decision procedures. Don’t we have Semantics in the Grid already? Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Embedding and implicit meaning is the enemy of shareability and reuse in an open and decoupled and collaborative environment. Machine processable descriptions are machine actionable descriptions
Next Generation Grids Reports NGG3 – 2005 Future for European Grids: GRIDs and Service Oriented Knowledge Utilities Vision and Research Directions 2010 and Beyond Main source of inspiration for FP6 Grid Research and beyond NGG2 – 2004 Requirementsand Optionsfor European Grids Research 2005-2010 and Beyond NGG1 – 2003 European Grid Research2005 – 2010 http://www.cordis.lu/ist/grids Source: David de Roure
Service-Oriented Knowledge Utility (SOKU) Next Generation Grids Report 2005 NGG3 Future for European Grids: GRIDs and Service Oriented Knowledge Utilities – Vision and Research Directions 2010 and Beyond, December 2006 A flexible, powerful and cost-efficient way of building, operating and evolving IT intensive solutions for business, science and society. • Building on existing industry practices and emerging technologies • Support ecosystems that promote collaboration and self-organisation • Towards increased agility, lower cost, broader availability of services • Empowering service providers, integrators and consumers of ICT • (R)evolution of concepts from Web, Grid & Knowledge technologies • Safe, ease and ubiquitous as existing utilities like electricity or water Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Source: David de Roure
Service-Oriented Knowledge Utility NGG3 The architecture comprisesservices which may be instantiated and assembled dynamically, hence the structure, behaviour and location of software is changing at run-time A utility is a directly and immediately useable service with established functionality, performance and dependability, illustrating the emphasis on user needs and issues such as trust Services are knowledge-assisted (‘semantic’) to facilitate automation and advanced functionality, the knowledge aspect reinforced by the emphasis on delivering high level services to the user Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Source: David de Roure
Research Challenges NGG3 Future for European Grids: GRIDs and Service Oriented Knowledge Utilities – Vision and Research Directions 2010 and Beyond, December 2006 End-User – Business/Enterprise –Manufacturing/Industrial Driving Scenarios Service-Oriented Knowledge Utility Human Factors and Societal Issues Pervasiveness Context Awareness Research Topics Adaptability Scalability Dependability Semantic Technologies Trust and Security in VOs Lifecycle Management Raising the Level of Abstraction NGG1&NGG2 vision and research challenges Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Lifecycle Management NGG3 • On-the-fly service creation and deployment • Robust, efficient and semantically aware discovery of services • Composition of services • Management of functional and non-functional properties and requirements • Support for multiple “economy models” for the grid Lifecycle Management Research Topic 1 Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Source: David de Roure
Trust and Security in VOs NGG3 • Ad hoc and managed virtual organisations of digital and physical entities • Policy and business practice • Service-level agreements • Authentication and authorisation in a multi-domain environment in which entities have multiple identities and multiple roles Trust and Security in VOs Lifecycle Management Research Topic 2 Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Source: David de Roure
Adaptability, Scalability, Dependability NGG3 • Self-* systems • self-managing, self-optimising, self-configuring, self-healing, self-protecting, self-organising • autonomic systems • Peer-to-peer • Scalability Trust and Security in VOs Lifecycle Management Adaptability Scalability Dependability Research Topic 3 Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Source: David de Roure
Raising the Level of Abstraction NGG3 • Higher level programming models and tools • New or improved management abstractions • Better operating systems capable of managing more complex resources and requirements from application, service and system contexts • Abstract/virtual service containers • Compact data formats Trust and Security in VOs Lifecycle Management Adaptability Scalability Dependability Raising the Level of Abstraction Research Topic 4 Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Source: David de Roure
Pervasiveness and Context Awareness NGG3 • High-level interoperability, smooth composition and automatic self-organisation of software with structure and behaviour changing at run-time • Non-functional requirements related to interoperability, heterogeneity, mobility, and adaptability Trust and Security in VOs Lifecycle Management Adaptability Scalability Dependability Pervasiveness Context Awareness Raising the Level of Abstraction Research Topic 5 Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Source: David de Roure
Semantic Technologies NGG3 • Mechanisation of composition • Scalable reasoning and formalisation • Heterogeneous and dynamic semantic descriptions • Lifecycle of knowledge • Collaboration and sharing Trust and Security in VOs Lifecycle Management Adaptability Scalability Dependability Pervasiveness Context Awareness Raising the Level of Abstraction Semantic Technologies Research Topic 6 Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Source: David de Roure
Human Factors and Societal Issues NGG3 • User requirements and evaluation • Intersection between the physical world and the digital • Personalisation techniques • Issues of collaboration and community • Socio-economic aspects Trust and Security in VOs Lifecycle Management Adaptability Scalability Dependability Pervasiveness Context Awareness Raising the Level of Abstraction Human Factors and Societal Issues Semantic Technologies Research Topic 7 Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Source: David de Roure
Outline • Background • What is the Grid? • Next-Generation Grids: SOKU • But, what is...? • Data and metadata • The Semantic Grid • Agenda Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
What is metadata? • “Data about data” • Is that enough to understand what it is? • Let’s analyse its role in different context in the area of the Semantic Web • Semantic (Annotation) Web • Semantic Data (Integration) Web • Semantic Knowledge (Reasoning) Web • I will use the terms metadata and annotations interchangeably Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Annotation assert facts using terms (metadata in RDF) Represent terms and their relationships (ontology in RDFS/OWL) News Videocast Grant Application Research Events Organisation Gene Database Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Types of vocabularies. Formality GALEN Lassila O, McGuiness D. The Role of Frame-Based Representation on the Semantic Web. Technical Report. Knowledge Systems Laboratory. Stanford University. KSL-01-02. 2001. Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Metadata annotation Ontology-based document annotation: trends and open research problems. Corcho, O. International Journal of Metadata, Semantics and Ontologies 1(1):47-57. 2006 • Different types of annotation depending on the type of vocabulary used Based on Dublin Core The contributor and creator is the flight booking service “www.flightbookings.com”. The date would be January 1st, 2003, in case that the HTML page has been generated on that specific date. The description would be something like “flight details for a travel between Madrid and Seattle via Chicago on February 8th, 2004”. The document format is “HTML”. The document language is “en”, which stands for English Based on thesauri Madrid is a reference to the term with ID 7010413 in the thesaurus, which refers to the city of Madrid in Spain. Spain is a reference to the term with ID 1000095, which refers to the kingdom of Spain in Europe. Chicago is a reference to the term with ID 7013596, which refers to the city of Chicago in Illinois, US. United States of America is a reference to the term “United States” with ID 7012149, which refers to the US nation. Seattle is a reference to the term with ID 7014494, which refers to the city of Seattle in Washington, US. Based on ontologies Concept instances relate a part of the document to one or several concepts in an ontology. For example, “Flight details” may represent an instance of the concept Flight, and can be named as AA7615_Feb08_2003, although concept instances do not necessarily have a name. Attribute values relate a concept instance with part of the document, which is the value of one of its attributes. For example, “American Airlines” can be the value of the attribute companyName. Relation instances that relate two concept instances by some domain-specific relation. For example, the flight AA7615_Feb08_2003 and the location Madrid can be connected by the relation departurePlace Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
RDF for Proteomic Standards Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 http://www.naturebiotechnology.org
Don’t Prescribe, Describe!! • The tyranny of the table • The tyranny of the tree “Not everything fits in onetaxonomy” -- Maryanne Martone (US BIRN) Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Source: Carole Goble
Integration use a uniform common model in RDF Connecting through shared terms and shared instances Preserving context and provenance Agents Smart portals Data mining Social networking Smart search Knowledge Discovery Information Integration and aggregation Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 Source: Carole Goble
Information Integration. Approaches and technologies • Architectures and systems • TSIMMIS, OBSERVER, PICSEL, Prometheus, etc. • Wrapper generation • D2R, R2O, etc. • Mediators • BIRN Mediator, etc. Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
A common vocabulary for data pooling www.godatabase.org ASA1 tryptophan biosynthesis tryptophan biosynthesis Gene Symbol Locus Name Function Function F15D2.31 Courtesy Chris Wroe Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Seamark Demo: ID new drug candidates for BRKCB-1 GO2Keyword.rdf Keywords.rdf ProbeSet.rdf Keyword GO2OMIM.rdf GO2UniProt.rdf Protein Gene Probe MIM Id OMIM.rdf IntAct.rdf GO.rdf GO2Enzyme.rdf UniProt.rdf Enzyme Organism Citation Compound Taxonomy.rdf Enzymes.rdf PubMed.xml KEGG.rdf Pathway Courtesy Joanne Luciano http://139.91.183.30:9090/RDF/VRP/Examples/schema_go.rdf http://139.91.183.30:9090/RDF/VRP/Examples/go.rdf Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007
Inference Logic-based classification and validity checking using OWL Rules using SWRL (Semantic Web Rule Language) RDF queries Just making connections because so much stuff is connected! Rearrangement of a DNA sequence homologous to a cell-virus junction fragment in several Moloney murine leukemia virus-induced rat thymomas 8q24 PVT1 Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007 James Hendler Science and the Semantic Web Science 299: 520-521, 2003
BioPAX Biochemical Reaction OWL (schema) Instances (Individuals) (data) Courtesy Joanne Luciano phosphoglucose isomerase 5.3.1.9 K Wolstencroft, A Brass, I Horrocks, P. Lord, U Sattler, R Stevens, D Turi A little semantics goes a long way in Biology Proc 4th ISWC 2005 Oscar Corcho. Primer taller de Grid Computing. Cali, Colombia, January 2007