590 likes | 782 Views
Part 4: Semantics and Metadata Semantic publication and discovery Provenance metadata Semantic Web and the Grid. Professor Carole Goble University of Manchester http://www.mygrid.org.uk. Registries. Workflow. Information. mIRs. Resources. Service. Virtual organisations and (Re)use.
E N D
Part 4: Semantics and MetadataSemantic publication and discoveryProvenance metadataSemantic Web and the Grid Professor Carole Goble University of Manchester http://www.mygrid.org.uk GGF Summer School 24th July 2004, Italy
Registries Workflow Information mIRs Resources Service Virtual organisations and (Re)use Service & Platform Administrators Bioinformaticians Service Providers Annotation providers Biologists Tool & middleware developers GGF Summer School 24th July 2004, Italy
Finding and selecting services Activation energy gradient Unregistered services • Scavenging • URLs and Soaplab endpoints • Introspection Registered services • Word-based searching • Semantic annotation for later discovery and (re)use by friends and strangers in your VO (Part 3) Drag and drop services onto Taverna workbench GGF Summer School 24th July 2004, Italy
Registry View Service • Registry • Third party registries • Third party services • Third party annotation (RDF) • Views over federated registries • UDDI interfaces extended with RDF • Federated views • Updated via Notification Service • Personalized based on Annotation • Authorisation and IPR GGF Summer School 24th July 2004, Italy
Semantic discovery • User chooses services • A common ontology is used to annotate and query any myGrid object including services. • Discover workflows and services described in the registry via Taverna. • Look for all workflows that accept an input of semantic type nucleotide sequence • Aim to have semantic discovery over public view on the Web. GGF Summer School 24th July 2004, Italy
Workflow and service annotation • Adding structured metadata to a workflow registration to enable others to discover and reuse it more effectively. E.g. what semantic type of input does it accept. GGF Summer School 24th July 2004, Italy
Can you guess what it is yet? GGF Summer School 24th July 2004, Italy
Service Registration http://pedro.man.ac.uk GGF Summer School 24th July 2004, Italy
Semantic Discovery • Drag a workflow entry into the explorer pane and the workflow loads. • Drag a service/ workflow to the scavenger window for inclusion into the workflow GGF Summer School 24th July 2004, Italy
Annotation Service Providers Ontologists Others Ontology Store Description extraction WSDL Interface Description Vocabulary Soap- lab Pedro Annotation tool Annotation providers Annotation/ description Taverna Workbench Registry (Personalised View) Registry Registry plug-in Registry GGF Summer School 24th July 2004, Italy
Annotation Ontologists Ontology Store Vocabulary Haystack Provenance Browser Pedro Annotation tool Annotation providers Annotation/ description Scientists Taverna Workbench mIR Store plug-in GGF Summer School 24th July 2004, Italy
Feta plug-in Registry plug-in Service Providers Ontology Store Ontologists Others Vocabulary WSDL Feta Semantic Discovery Soap- lab Bioinformaticians Registry Taverna Workbench Registry (Personalised View) Registry Registry Workflow Execution FreeFluo WfEE invoking mIR Store data & metadata GGF Summer School 24th July 2004, Italy
Layered Semantics • Domain Semantics layered on top of domain neutral but scientific data model • Reducing the activation energy, lowering barriers of entry. Domain Semantics Ontologies Data Metadata IMv2 Workflow metadata Experiment Semantics Format XSD types MIME types Service Metadata Provenance metadata Syntax Workflow OGSA-DQP GGF Summer School 24th July 2004, Italy
Model of services Operation name, description task method resource application Service name description authororganisation Parameter name, description semantic type format transport type collection type collection format hasInput hasOutput subclass subclass WSDL based operation WSDL based Web service workflow bioMoby service Soaplab service Local Java code GGF Summer School 24th July 2004, Italy
Task Service class Specific services IBM Life Sciences service setProgram() BLAST SOAPLAB service createJob() setDatabase() BLAST Sequence similarity search BLASTservice run() or setE_value() getResults() blastQuery() Tiered specifications Classes of services Domain “semantic” “Unexecutable” “Potentials” Instances of services Business “operational” “Executable” “Actuals” GGF Summer School 24th July 2004, Italy
Matrix of metadata in workflow lifecycle GGF Summer School 24th July 2004, Italy
Stratified metadata • Service Type and Class (OWL) • Service Instance (RDF) GGF Summer School 24th July 2004, Italy
Scufl URI Workflow registry entry RDF Store Workfllow Executive Summary Descriptions Inputs, Outputs, Tasks, Component resources Invokable Interface descriptions e.g. XML data types stored WSDL Syntactic descriptions e.g. MIME types RDF Conceptual descriptions OWL encoded Operational Descriptions Cost, QoS Access rights… OWL/RDF Provenance Descriptions Authors, creation date, institution… Service and Workflow registration • Description scheme • RDFS & DAML+OIL / OWL ontologies of services & biology • Based on DAML-S • Reasoning over OWL descriptions • Query over RDF • Aim to have semantic discovery over public view on the web. Workflow registration allows peer review and publication of e-Science methods. GGF Summer School 24th July 2004, Italy
Service Ontology Suite parameters: input, output, precondition, effect performs_task uses-resource is_function_of Upper level ontology Inspired by DAML-S Informatics ontology Molecularbiology ontology Publishing ontology Organisationontology Task ontology Bioinformatics ontology Web serviceontology Current work: Joint development on an Open Biological Ontologies BioService Ontology. http://obo.sourceforge.net/ GGF Summer School 24th July 2004, Italy
Reflections • Adverts for services and workflows turns out to be tricky • Describing different executable objects • Workflows and Services • Stratification of metadata • Classes and Instances of services and workflows • Service execution • Complex state based invocation models • Parametric polymorphism of services • Executable process models vs discovery process models • Multi-dimensions of service composition. GGF Summer School 24th July 2004, Italy
Reflections • Multiple descriptions, multiple interfaces • Users needs vs machine needs • The dimensions of Service Class substitution • Biologists choose experimentally meaningful services and do not want “semantically similar” substitutions; only substituting one instance for another • Experimentally neutral “glue” services that can be substituted are comparatively few • If users are choosing services you don’t need many kinds of metadata to eliminate 90% of options. GGF Summer School 24th July 2004, Italy
Reuse and Repurposing • Describing for reuse is challenging • Reuse depends on semantic descriptions and these are costly to produce • Describing for someone else’s benefit • Reuse by multiple stakeholders • Licensing workflows for reuse. • Authorisation models • But reuse does happen! • Metadata pays off but it needs a network effect and there is a cost. GGF Summer School 24th July 2004, Italy
So far, Using Concepts • Controlled vocabulary for advertisements for workflows and services • Indexes into registries and mIR • Semantic discovery of services and workflows • Semantic discovery of repository entries • Type management for composition • Semantic workflow construction: guidance and validation • Navigation paths between data and knowledge holdings • Semantic “glue” between repository entries • Semantic annotation and linking of workflow provenance logs GGF Summer School 24th July 2004, Italy
Part 4: Semantics and MetadataSemantic publication and discoveryProvenance metadataSemantic Web and the Grid GGF Summer School 24th July 2004, Italy
Provenance Experiments being performed repeatedly, at different site, different time, by different users or groups; A large repository of records about experiments!! • verification of data; • “recipes” for experiment designs; • explanation for the impact of changes; • ownership; • performance of services; • data quality; Scientists In silico experiments: GGF Summer School 24th July 2004, Italy
Genomic Project data1 WSDL serviceInvocation1 data2 data3 dataAnother serviceInvocation2 data4 Process provenance Data provenance Organisation provenance Knowledge provenance Provenance Web GGF Summer School 24th July 2004, Italy
Representing links urn:lsid:taverna.sf.net:datathing:45fg6 urn:lsid:taverna.sf.net:datathing:23ty3 • Identify each resource • Life science identifier: URI with associated data and metadata retrieval protocols. • Understanding that underlying data will not change GGF Summer School 24th July 2004, Italy
Representing links II http://www.mygrid.org.uk/ontology#derived_from urn:lsid:taverna.sf.net:datathing:45fg6 urn:lsid:taverna.sf.net:datathing:23ty3 • Identify link type • Again use URI • Allows us to use RDF infrastructure • Repositories • Ontologies GGF Summer School 24th July 2004, Italy
Knowledge Level Data Level Organisation Level Provenance Pyramid Process Level GGF Summer School 24th July 2004, Italy
Organisation level provenance Process level provenance Service Project runBye.g. BLAST @ NCBI Experiment design Process Workflow design componentProcesse.g. web service invocation of BLAST @ NCBI partOf Event instanceOf componentEvente.g. completion of a web service invocation at 12.04pm Workflow run Data/ knowledge level provenance knowledge statementse.g. similar protein sequence to run for User can add templates to each workflow process to determine links between data items. Data item Person Organisation Data item Data item GGF Summer School 24th July 2004, Italy data derivation e.g. output data derived from input data
..masked_sequence_of .. nucleotide_sequence project ..part_of organisation >gi|19747251|gb|AC005089.3| Homo sapiens BAC clone CTA-315H11 from 7, complete sequence AAGCTTTTCTGGCACTGTTTCCTTCTTCCTGATAACCAGAGAAGGAAAAGATCTCCATTTTACAGATGAG GAAACAGGCTCAGAGAGGTCAAGGCTCTGGCTCAAGGTCACACAGCCTGGGAACGGCAAAGCTGATATTC AAACCCAAGCATCTTGGCTCCAAAGCCCTGGTTTCTGTTCCCACTACTGTCAGTGACCTTGGCAAGCCCT GTCCTCCTCCGGGCTTCACTCTGCACACCTGTAACCTGGGGTTAAATGGGCTCACCTGGACTGTTGAGCG experiment definition rdf:type ..part_of group urn:lsid:taverna:datathing:13 ..part_of ..author workflow definition ..works_for ..invocation_of ..author person ..BLAST_Report workflow invocation ..similar_sequences_to ..run_for ..run_during service description rdf:type 19747251 AC005089.3 831 Homo sapiens BAC clone CTA-315H11 from 7, complete sequence 15145617 AC073846.6 815 Homo sapiens BAC clone RP11-622P13 from 7, complete sequence 15384807 AL365366.20 46.1 Human DNA sequence from clone RP11-553N16 on chromosome 1, complete sequence 7717376 AL163282.2 44.1 Homo sapiens chromosome 21 segment HS21C082 16304790 AL133523.5 44.1 Human chromosome 14 DNA sequence BAC R-775G15 of library RPCI-11 from chromosome 14 of Homo sapiens (Human), complete sequence 34367431 BX648272.1 44.1 Homo sapiens mRNA; cDNA DKFZp686G08119 (from clone DKFZp686G08119) 5629923 AC007298.17 44.1 Homo sapiens 12q22 BAC RPCI11-256L6 (Roswell Park Cancer Institute Human BAC Library) complete sequence 34533695 AK126986.1 44.1 Homo sapiens cDNA FLJ45040 fis, clone BRAWH3020486 20377057 AC069363.10 44.1 Homo sapiens chromosome 17, clone RP11-104J23, complete sequence 4191263 AL031674.1 44.1 Human DNA sequence from clone RP4-715N11 on chromosome 20q13.1-13.2 Contains two putative novel genes, ESTs, STSs and GSSs, complete sequence 17977487 AC093690.5 44.1 Homo sapiens BAC clone RP11-731I19 from 2, complete sequence 17048246 AC012568.7 44.1 Homo sapiens chromosome 15, clone RP11-342M21, complete sequence 14485328 AL355339.7 44.1 Human DNA sequence from clone RP11-461K13 on chromosome 10, complete sequence 5757554 AC007074.2 44.1 Homo sapiens PAC clone RP3-368G6 from X, complete sequence 4176355 AC005509.1 44.1 Homo sapiens chromosome 4 clone B200N5 map 4q25, complete sequence 2829108 AF042090.1 44.1 Homo sapiens chromosome 21q22.3 PAC 171F15, complete sequence urn:lsid:taverna:datathing:15 service invocation ..described_by ..created_by ..filtered_version_of A B Provenance tracking • Automated generation of this web of links • Workflow enactor generates • LSIDs • Data derivation links • Knowledge links • Process links • Organisation links Relationship BLAST report has with other items in the repository Other classes of information related to BLAST report GGF Summer School 24th July 2004, Italy
Haystack (IBM/MIT) GenBank record Portion of the Web of provenance Managing collection of sequences for review GGF Summer School 24th July 2004, Italy
Reflections • Visualisation of results usually domain specific • Provenance browsing and querying needs to fit with that visualisation • Generic graphical presentation limited to small, low complexity result sets • Layered provenance for different purposes and different stakeholders • Detailed process for debugging and usage statistics for QoS • Data and Knowledge for the Scientist • Migration with data objects • Versioning • Using provenance to its maximum potential GGF Summer School 24th July 2004, Italy
OWL Ontologies mapping between objects LSID HTML XML XML URI LSID XML RDF PDF Map of Context Literature relevant to provenance study or data in this workflow Provenance record of a workflow run Interlinking graph of the workflow that generates the provenance logs Web page of people who has related interests as the owner of the workflow Experiment Notes GGF Summer School 24th July 2004, Italy
URI LSID LSID URI LSID metadata Provenance metadata • Outside objects • RDF store • Within objects • LSID metadata. GGF Summer School 24th July 2004, Italy
Linked Provenance Resources The subsumed concepts Link to the log annotated with more general concept The subsuming concepts Link to the log annotated with more specific concept GGF Summer School 24th July 2004, Italy
Generating Links The concept The generated Link to related provenance document The name of the data GGF Summer School 24th July 2004, Italy
Semantics Ontology-aided workflow construction • RDF-based service and data registries • RDF-based metadata for ALL experimental components • RDF-based provenance graphs • OWL based controlled vocabularies for database content • OWL based integration of experiment entities RDF-based semantic mark up of results, logs, notes, data entries GGF Summer School 24th July 2004, Italy
Standards • By tapping into (defacto) standards (LSID, RDF, WS-I) and communities we can leverage others results and tools • Haystack, Pedro, Jena, CHEF/Sakai. • The Grid standards are confusing and volatile • The choice of vanilla Web Services was good. • We didn’t jump to OGSI. We won’t jump to WSRF until its necessary. • And workflow standards have been untimely. GGF Summer School 24th July 2004, Italy
Controlling contents of metadata and data Ontologies Describing & Linking Provenance records Resource annotations Change & event Notification topics Role of Ontologies Service matching and provisioning Composing and validating workflows and service compositions & negotiations Service & resource registration & discovery Help Knowledge-based guidance and recommendation Schema mediation GGF Summer School 24th July 2004, Italy
Part 4: Semantics and MetadataSemantic publication and discoveryProvenance metadataSemantic Web and the Grid GGF Summer School 24th July 2004, Italy
A pioneer of the… The Semantic Grid is an extension of the current Grid in which information and services are given well-defined and explicitly represented meaning, better enabling computers and peopleto work in cooperation Semantics in and on the Grid GGF Summer School 24th July 2004, Italy
The semantics of knowledge • Semantic Grids • Grids and Grid middleware that makes use of semantics for its installation, deployment, running etc. • I.e. Semantics IN the Grid FOR the Grid. • Knowledge Grids • A virtual knowledge base derived by using the Grid resources, in the same spirit as a data grid is a virtual data resource and a compute grid a virtual computer. Knowledge Grids include services for knowledge mining. • I.e Semantics ON the Grid arising from the USE of the Grid. GGF Summer School 24th July 2004, Italy
Scientific Applications Scientists Grid platform and resources Grid Middleware Security policies standards Service Computer Scientists Providers Knowledge Stakeholders Knowledge for the Grid Application Semantics for the Grid Sources of Knowledge GGF Summer School 24th July 2004, Italy
“The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. It is based on the idea of having data on the Web defined and linked such that it can be used for more effective discovery, automation, integration, and reuse across various applications.” Hendler, J., Berners-Lee, T., and Miller, E. Integrating Applications on the Semantic Web, 2002, http://www.w3.org/2002/07/swint. GGF Summer School 24th July 2004, Italy
Big Vision The Web today is: • A hypermedia digital library • Collection of linked web pages • Ubiquitous interface to applications • Amazon.com • A platform for multimedia • BBC Radio 4 in my room! • A naming scheme • Unique identity for resources A place where people do the work, filtering, linking and interpreting. Computers do the presentation. Why not make the computers do the work? From machine readable resources for humans to computable resources for machines GGF Summer School 24th July 2004, Italy
Expose the meaning of resources by assertions in a common data model… • Publish and share consensually agreed ontologies so we can share the metadata and add in background knowledge • Then we can query, filter, integrate and aggregate the metadata … • and reason over it to infer more metadata using rules … • and attribute trust to the metadata. hasvenue http://www.marriott.com/epp/... http://www.amia.org/meetings/... haslocation organisedby event conference hotel period haslocation Washington http://www.amia.org/about/ dates city locatedin locatedin USA country GGF Summer School 24th July 2004, Italy
On demand transparently constructed multi-organisational federations of distributed services Distributed computing middleware Computational Integration Sharing Resources Infrastructure enablers for e-Research Grid Computing Semantic Web • An automatically processable, machine understandable web • Distributed knowledge and information management • Information integration • Sharing information GGF Summer School 24th July 2004, Italy
Semantic Web layers Trust p -> a; p=a p -> a; p=a Rules p -> a; p=a p -> a; p=a p -> a; p=a Agents Ontologies Metadata Annotation Search engines and filters Web Applications Deep web GGF Summer School 24th July 2004, Italy