370 likes | 533 Views
e -science is…. Legos. “Science is built up of facts, as a house is built of stones; but an accumulation of facts is no more a science than a heap of stones is a house.” – Henri Poincaré, Science and Hypothesis, 1905 http://adaptivedisclosure.org.
E N D
e-science is… BioAID
Legos “Science is built up of facts, as a house is built of stones; but an accumulation of facts is no more a science than a heap of stones is a house.” – Henri Poincaré, Science and Hypothesis, 1905 http://adaptivedisclosure.org
Who will annotate the annotators themselves? facilitating resource management with (semantic) web services M. Scott Marshall
Examples on web • Example of less accessible: WSDL list for AIDA serviceshttp://ws.adaptivedisclosure.org/ (these services “annotate”) • Human-readable service info: http://xml.ddbj.nig.ac.jp/wsdl/index.jsp • But not machine-readable..
Outline • Vision – an e-science virtual laboratory • Some definitions • Some requirements • Essential concepts of semantic web • facets for interfaces • Conclusions
The Vision: Scientist as knowledge worker • For Knowledge Workers: • Knowledge is the data (i.e. rules, relations, properties, hypotheses, etc.) • For Today's Biologist: • Numbers, sequences, organisms(!), and images are the data • Manipulate knowledge instead of data • Find support for relations between concepts instead of discovering table and column names and numbers. • In the virtual laboratory, everything is a resource that can be described and manipulated with semantics
User • ....? • End users – scientists using our applications • API users – programmers extending and using our code • System administrators – setting up services, grids etc. • Other classes... • If you’re not sure which one someone means please shout and ask them! Slide courtesy of Tom Oinn, OMII-EBI Workshop
Service Oriented Architecture (SOA) • A way of doing computing where services are somehow combined to perform some overall function • Implies a communication framework between the services • Used because it’s easier to reconfigure the arrangement of a set of services than to rewrite a script • Services as LEGO bricks Slide courtesy of Tom Oinn, OMII-EBI Workshop
Grid • Not just Globus, or EGEE, or Naregi... • No such thing as ‘the grid’ • Unlike ‘the internet’ which does exist! • We mean : • A computational facility, normally comprising multiple computers, which provides some combination of compute and data storage capacity and which can abstract over its inner workings in some fashion • Very loose definition! • Can be part of a Service Oriented Architecture Slide courtesy of Tom Oinn, OMII-EBI Workshop
Knowledge “data”, “information”, “facts”, “knowledge” Knowledge is a statement that can be tested for truth. (by a machine)
RDF : a web format for knowledge RDF is a W3C language to express statements. RDF Triple: Subject Predicate Object Graph of Knowledge: Node Edge Node
OWL : The Web Ontology Language A W3C standard for ontology representation based on description logic.
Resources are shared on the web • Shared: • CPU time • network bandwidth • memory • storage space • But also: • Data • Knowledge • Services
Computational experiment: what we want to do with the resources Database Database Computational experiment in workflow environment ... Database
What are the tasks? • Search – discovering resources that match our needs • Workflow composition • Data integration • Enactment/Deployment • Access control • Registry of a resource
Issues raised by computational experimentation • How will we find relevant data? • How will we automatically integrate such data into our experiment? • How will we find apropriate services? • How will we integrate our results as usable data for a new (computational) experiment? • -> annotation
Finding the stone… Where is the piece thatis red, has a triangular top, and was previously used to build a roof? BioAID
Computational Experiments Anticipated needs of the data consumer • Data integration - combining different types of data • Data annotation: beyond formats • Not only: • Data types (integer, string, etc.) • But also: • Data semantics: What do the data represent? • Determined by the experimental design • Provenance: What has been done to the data? • Description of the procedure(s) that produced/transformed the data • Discover and enact appropriate (web) services with appropriate data • Reuse results from a computational experiment as data in another computational experiment • derived data is “tagged” and put into the repository
Anticipated needs of the data supplier (and consumer) • Data in: • Simple submission/registration of data to e-science repository • Semi-automatic annotation • Data out: • Easy search and retrieval of previous datasets (my personal and my group’s data) • Easy search and retrieval of relevant datasets from public repository • Combining data: • Different types and different sources • Example: Intersecting views of data • data mapped to physical or semantic space (Examples follow..)
The Semantic Gap Application Middleware Resources User
The Model in the middle My Model Model Model Application Middleware Resources User
Why semantic annotation? We want annotation to be “machine-readable”: • Free text – arbitrary text tags generated by users won’t always match up • Simplest problem: Finding a “named” object • Hyponyms - Different names exist for the same object in different contexts and roles. • Synonyms - The same name is used for different objects. • Which name should I use? • Standardized vocabulary list • can only find literal matches • Example: Using data types to search for services will find too many! • Semantic tags • allow searching for similar items: • “Find items like this one.” • allow searching with a description: • “Find items with these properties.” • semantic description of service (SA-WSDL) as well as data (OWL)
What is an ontology? Definitions: • A collection of things that are defined in terms of their properties and relations to other things. • A specification of a conceptualization that is designed for reuse across multiple applications and implementations (Gruber ’93, ‘95, Guarino’ ‘96, Guarino and Giaretta ‘95) General applications: • Searching for objects that are resources, documents, concepts, experimental data, or collections of these things. • Knowledge capture • Example: Biological model with hypothetical knowledge Common applications in bioinformatics: • Annotation of database entries (e.g. gene products) • Categorization of clustered elements (e.g. genes)
Inheritance in ontologies Animal • Often represented as DAG’s (Directed Acyclic Graphs) or hierarchies (trees) • Power of inheritance • Subsumption relations (ISA) apply transitivity to create inheritance of class and properties downward along chains in the hierarchy. • Use an element as a metadata tag for semantic annotation (ontotag) • An ontotag serves as a pointer into a “semantic space” Bird Mammal Robin Heron Penguin
Gene Ontology Mouse p53: {List of GO identifiers} Process: apoptosis, DNA damage response, signal transduction by p53 class mediator... Component: cytoplasm, cytosol... Function: DNA binding, protein binding... • Cluster of genes X from micro array analysis • Collection of {List of GO identifiers} per gene in cluster • Most prevalent GO identifiers: • Apoptosis, Cytosol, Protein Binding • Significant relationships between GO classes (e.g. cell death and DNA damage response)
Semantic annotation - ontotags Evidence Ontology Provenance Author Gene Ontology Metadata
Resource mngmt use case: data integrationFinding a basis for relation Hypothesis Epigenetic Mechanisms Transcription “There is a relation” Chromatin Transcription Factors Histone Modification Transcription Factor Binding Sites Classes Instances Common Domain position KSinBIT’06
Scenario: A Use Case is born • E-scientist explains benefits of semantic web to (wet lab) biologist • Biologist wants to see a demonstration with actual data • => Use Case: Find evidence of a relation between transcription and histone modifications • Our approach: Annotate data with our own semantic types so that we can issue a query using our own terms KSinBIT’06
Computer readable model Biologist readable model E-science perspective on data integration:From cartoon to model to semantic data integration Biological concepts (‘myModel’) Data KSinBIT’06
Some of the pieces we need • knowledge representation – triples • pointing at things: EPR's and URI's, not just the things but the statements about the things • unification and reasoning • annotation: linking knowledge to resources
Computational experiment Database Database Some provenance should be added by the module/service itself ... Database
The AIDA toolbox for knowledge extraction and knowledge managementreusable components to enhance science BioAID
Living examples:dynamic interfaces • http://aida.science.uva.nl:9999/search/AID • Yahoo Pipes interface to AIDA medline search: http://pipes.yahoo.com/pipes/pipe.info?_id=cv7nIBpw3BGw4NOLJphxuA • MeSH facet interface from Exhibit: http://aida.science.uva.nl:9999/search/json_test.html • W3C Health Care and Life Sciences KB (unofficial URL): http://www.w3.org/2001/sw/hcls/notes/kb/http://esw.w3.org/topic/HCLS/Banff2007Demo
Conclusions • The Web is a collection of resources: resource sharing • Disclosure of semantic models can greatly enhance resource sharing and resource management • Semantic annotation can be applied to any type of resource: data and (web)services. • Semantic annotation and provenance can be added by the (web)services themselves. • Need text mining for web services (to support semantic annotation) • Need web services for text mining
The End “Science is built up of facts, as a house is built of stones; but an accumulation of facts is no more a science than a heap of stones is a house.” – Henri Poincaré, Science and Hypothesis, 1905 http://adaptivedisclosure.org