380 likes | 501 Views
Semantic Data Integration in myGrid and ourGrid (SEEK). National e-Science Centre e-Science Institute, Edinburgh May 14 th , 2004. Plan of the Day. 9:00–10:30 SEEK Data Integration & Semantic Extensions 10:30–11:00 BREAK 11:00–12:30 myGrid Data Integration & Semantic Extensions
E N D
Semantic Data Integration in myGrid and ourGrid (SEEK) National e-Science Centre e-Science Institute, Edinburgh May 14th, 2004
Plan of the Day • 9:00–10:30 • SEEK Data Integration & Semantic Extensions • 10:30–11:00 BREAK • 11:00–12:30 • myGrid Data Integration & Semantic Extensions • 12:30–13:45 LUNCH • 13:45–15:45 • Interoperable Semantic Registration, Mediation, Workflows • 15:45–16:00 BREAK • 16:00–17:00 • Plenary Session
Sparrow SEEK Data Integration & Semantic Extensions Shawn Bowers (SDSC/UCSD) Bertram Ludaescher (SDSC/UCSD) & SEEK KR-SMS Team & GEON KR Team http://seek.ecoinformatics.org
Purpose / Goals • Link-Up: • … [on] data / services with “semantics” • … to do semantic data & service integration • also: an e-Science “Sister Project” to facilitate knowledge exchange & collaboration between UK & US based projects (where is the web/wiki page?) • Specifically: • What approaches to express semantics of data, services, and workflows do we all use? • How can we make them interoperable? • … keeping in mind… • What problem is it that the XYZ solution solves?
our focus one specific problem (DILS’04) Science Environment for Ecological Knowledge • Domain Science Driver • Ecology (LTER), biodiversity, … • Analysis & Modeling System • Design & execution of ecological models & analysis • End (&power) user focus • {application,upper}-ware Kepler • Semantic Mediation System • Data Integration of hard-to-relate sources and processes • Semantic Types and Ontologies • upper middleware Sparrow Toolkit • EcoGrid • Access to ecology data and tools • {middle,under}-ware architecture
Heterogeneous Data integration • Requires advanced metadata and processing • Attributes must be semantically typed • Collection protocols must be known • Units and measurement scale must be known • Measurement relationships must be known • e.g., that ArealDensity=Count/Area
? Information Integration sequence info (CaPROT) protein localization (NCMIR) morphometry (SYNAPSE) neurotransmission (SENSELAB) Biomedical Informatics Research Network http://nbirn.net A Neuroscientist’s Information Integration Problem What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? “Complex Multiple-Worlds” Mediation
? Information Integration Crime Stats Demographics Realtor School Rankings A Home Buyer’s Information Integration Problem What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood with below-average crime rate and diverse population? “Multiple-Worlds” Mediation
addall.com ? Information Integration barnes&noble.com A1books.com amazon.com half.com An Online Shopper’s Information Integration Problem El Cheapo: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?” “One-World” Mediation
Standard (XML-Based) Mediator Architecture USER/Client Query Q ( G (S1,..., Sk) ) Integrated Global (XML) View G Integrated View Definition G(..) S1(..)…Sk(..) MEDIATOR (XML) Queries & Results (XML) View (XML) View (XML) View wrappers implemented as web services Wrapper Wrapper Wrapper S1 S2 Sk
Semantics Structure Syntax • reconciling S4heterogeneities • “gluing” together multiple data sources • bridging inforbmation and knowledge gaps computationally System aspects Information Integration: Problems and “Solutions” • System aspects: “Grid” Infrastructure • Authentication, single sign-on, … • distributed computation • web wervices, WSDL/SOAP, … • sources = functions, files, databases, … • Syntax & Structure: (XML-Based) Database Mediators • wrapping, restructuring • distributed (XML) queries and views • sources = (XML) databases • Semantics: Model-Based/Semantic Mediators • conceptual models, declarative views • ontologies, description logics (OWL, RDF,…) • sources = knowledge bases (DB+CMs+ICs)
Exercise:Classify (system, syntax, structure, semantics, sth else …) • “9:00” vs “9am” vs “21:00” vs “9 ct” • “3 miles” (land|sea) (here UK|US|elsewhere) (now|elsewhen) … • “picea rubens” (name vs concept … in biological taxonomies) • …
Different Types of “Ontologies” and Representations • Overloaded/sloppy for a… • “Napkin drawing”, “concept space” (e.g. in PPT) • Labeled graph, semantic network, concept map (e.g. in RDF) • Controlled vocabulary(structured or flat) • Database schema (relational, XML, …) • Conceptual schema (ER, UML, … ) • Thesaurus (synonyms, broader term/narrower term) • Taxonomy • Formal ontology, e.g., in [Description] Logic (e.g. in OWL) • “formalization of a specification” • An ontology may … • constrain possible interpretation of terms • specify a theory by defining and relatingconcepts of a domain of interest • theory = set of logic models (=“allowed/intented intepretations” of symbols)
Community-Based Ontology Development • Draft of a geochemistry ontology developed by scientists • Current concept maps and • emerging ontologies in GEON: • Igneous Rocks/Plutons • Seismology • Geochemistry • … in SEEK: • Taxon • Units • Measurements • …?
Creating and Sharing Concept Maps (here: Seismology concept map, Cmap tool; Kai Lin, GEON) • Lock up scientists for 2+ days • Add CS/KRDB types • Create concept maps • Refine • Iterate from napkin drawings, to concept maps, to ontologies
Graph (RDF) Queries on Ontologies visualization RQL Query: Show all “products” Prototype: Kai Lin, GEON Query Results
Ontologies: Qui bono? • What are ontologies used for? • Conceptual models of a domain or application, (communication means, system/database design, …) • Classification of … • concepts (taxonomy) and • data/object instances based on properties and concept definitions • Analysis of ontologies e.g. • Graph queries (reachability, path queries, …) • Reasoning (concept subsumption, consistency checking, …) • Targets for semantic data registration • Conceptual indexes and views for “smart” operations
Using ontologies for … • Smart data discovery • Smart service discovery • Smart (data) querying • Smart data integration (declarative) • Smart workflow planning (execution !?) (procedural) • Here: def_macro “smart” := (ontology|semantics) – (based|enhanced|enabled)
Specifically in SMS .. • “smart data discovery” – e.g., … • asking for A, retrieve B’s too, since B isa A • “smart connections” – e.g., … • data/source binding to AMS (Kepler) services (actors) • service-to-service semantic (and structural?) type checking • service-to-service & data-to-service “gluing” (insert structural transformations, unit conversions, suggest services based on parameter chasing (parameter ontologies)
… specifically in SMS .. (Cont’d) • “smart data integration”– e.g., … • concept-based instance classification and data enumeration (as part of integrated/mediated views) • discovery and use of new join relations across sources • rewriting queries (against which SEEK/EcoGrid/EML schemas??) using ontologies & integrity constraints • generation of feasible distributed query plans in the presence of access patterns (web services), views, integrity constraints (ICs) Need for “semantic registration/annotation” • Linking data structures/objects to conceptual structures
Things to “Register” • Data files (individual files) • Shapefile as a blob (+ file type) • Collections (of files; nested; eg satellite data) • Databases (has schema and can be queried) • Shapefile with schema registered • Ontologies • Services (web + grid services) • Other/external applications
Ontologies and Data Management( watch out for Semantic Data Registration later) Ontology use concepts from (explicitly or implicitly) Design Artifact Conceptual Model Conceptual Model Schema Schema Schema Schema Metadata Data
A Multi-Hierarchical Rock Classification “Ontology” (GSC) Genesis Fabric Composition Texture
+/- a few hundred million years domain knowledge Application Example: Geologic Map Integration Knowledge representation Ontologies!? Nevada • “Semantic Registration” of shapefiles to a shared ontology • concept-based queries; also allows … • … viewing of British-registered USGS data through Canadian eyes
(≺) Example: Smart Connections [DILS’04] • Services can be semantically compatible, but structurally incompatible Ontologies (OWL) Compatible (⊑) SemanticType Ps SemanticType Pt Incompatible StructuralType Ps StructuralType Pt (⋠) (Ps) Desired Connection Source Service Target Service Pt Ps
Example: Smart Connections [DILS’04] Ontologies (OWL) Compatible (⊑) SemanticType Ps SemanticType Pt Registration Mapping (Input) Registration Mapping (Output) StructuralType Ps StructuralType Pt Correspondence (Ps) Generate Source Service Target Service Transformation Pt Ps Desired Connection
The Sparrow Toolkit (Origins) • Annoyance with ugly, user-unfriendly XML syntaxes (e.g., OWL in XML, rules in XML, … anything in XML) • Note: others got annoyed too, but we didn’t know [OWL Concrete Abstract Syntax, Bechhofer et al.] • (well, we knew about Triple, but that’s only RDF…) • Instead use a lean syntax (how XML should have been) • owl employee isa person and worksfor some employer. • owl mother eqv person and female and hasChild some person. • rdf john, worksfor, ‘IBM’. • … are both human and machine readable • … in fact the language was invented around the corner… • … and this is the “parser”: • :- op(1100, fx, owl), op(1100, fx, rdf), • :- op(600, xfx, isa), op(600, xfx, eqv). • :- op(550, xfy, or), op(500, xfy, and), op(350, fx, not). • :- op(400, xfy, some), op(400, xfy, only).
Sparrow (a poor man’s OWL tool …) Simple ASCII-based RDF and OWL entry and manipulation
Sparrow Toolkit • Much more than a lean syntax for OWL & RDF • Syntax transformation services: • RDF, OWL, … Sparrow RDF, OWL, LaTeX, FO/LeanTap, … • Semantic registration services • Semantic Annotation language • Reasoning services • Classification, Consistency checking, Conversion, Query rewriting, … • Will be provided in Kepler • e.g., as actors, but also as type extensions
Sparrow: The Name • “A poor man’s OWL” • or how XML really should look like • “Lieber den Spatz in der Hand als die Taube auf dem Dach” • Better a sparrow in the hand than a pigeon/dove on the roof • Also: In Memoriam:
Some work in progress … [short-paper SSDBM’04]
References • SMS: • An Ontology Driven Framework for Data Transformation in Scientific Workflows. S. Bowers and B. Ludäscher. In International Workshop on Data Integration in the Life Sciences (DILS), LNCS, Leipzig, Germany, March 2004. • On Integrating Scientific Resources through Semantic Registration, S. Bowers, K. Lin, and B. Ludäscher, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece. • Towards a Generic Framework for Semantic Registration of Scientific Data. S. Bowers and B. Ludäscher. In Semantic Web Technologies for Searching and Retrieving Scientific Data (SCISW), Sanibel Island, Florida, 2003. • Processing First-Order Queries under Limited Access Patterns, Alan Nash and B. Ludäscher, Proc. 23rd ACM Symposium on Principles of Database Systems (PODS'04) Paris, France, June 2004, to appear. • Processing Unions of Conjunctive Queries with Negation under Limited Access Patterns, Alan Nash and B. Ludäscher., 9th Intl. Conference on Extending Database Technology (EDBT'04) Heraklion, Crete, Greece, March 2004, LNCS. • Web Service Composition Through Declarative Queries: The Case of Conjunctive Queries with Union and Negation, B. Ludäscher and Alan Nash. Research abstract (poster), 20th Intl. Conference on Data Engineering (ICDE'04) Boston, IEEE Computer Society, April 2004. • Teaching: Graduate Class: CSE-291 – Ontologies in Data and Process Integration: http://www.sdsc.edu/~ludaesch/CSE-291-Spring-04/ (Bertram; guest lectures by Shawn) • …
References • Kepler • Kepler: An Extensible System for Design and Execution of Scientific Workflows, I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, S. Mock, 16th International Conference on Scientific and Statistical Database Management (SSDBM'04), 21-23 June 2004, Santorini Island, Greece. • Kepler: Towards a Grid-Enabled System for Scientific Workflows. I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, and S. Mock In Workshop on Workflow in Grid Systems, Global-Grid Forum (GGF10), Berlin, Germany, March 2004. • A Web Service Composition and Deployment Framework for Scientific Workflows, I. Altintas, E. Jaeger, K. Lin, B. Ludaescher, A. Memon, In Intl. Conference on Web Services (ICWS), San Diego, California, July 2004. • Kepler: Towards a Grid-Enabled System for Scientific Workflows, Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher (presenter), Steve Mock, Workflow in Grid Systems (GGF10), Berlin, March 9th, 2004. • Kepler/GEON User Manual, Efrat Jaeger. • The Computational Chemistry Prototyping Environment, Kim Baldridge, Jerry Greenberg, Wibke Sudholt, Karan Bhatia, Stephen Mock, Ilkay Altintas, Cline Amoreira, Yohan Potier, Mucaehl Taufer • …