190 likes | 383 Views
Information Integration / Mediation . Goal:combine data from different sources s.t. the integrated whole is more than the sum of its isolated parts=> SDSC/CSE MIX project (Mediation of Information in XML)Standard Scenarios:C2B, e.g. comparison shopping: AddAll := I
E N D
1. Semantic Mediation of Scientific Data via Logic-Based Data Federation Software Amarnath Gupta
Bertram Ludäscher
Reagan Moore
San Diego Supercomputer Center
University of California, San Diego
2. Information Integration / Mediation Goal:
combine data from different sources s.t. the integrated whole is more than the sum of its isolated parts
=> SDSC/CSE MIX project (Mediation of Information in XML)
Standard Scenarios:
C2B, e.g. comparison shopping:
AddAll := IntegratedView(amazon, barnes&noble, ...)
B2B, e.g. marketplaces:
Virt_Market := IntegratedView(supplier_1, ... supplier_n)
C2M, e.g. home-buyer:
Full_Picture := IntegratedView(Realtor, Crime, Schools, ...)
3. MIX Mediation Challenges MIX Mediator Architecture (middleware)
wrappers: wrap different data into common format (XML)
mediator: combines sources’ XML views into IntegratedView
MIX Mediator Components
declarative mediator view definition language:
XMAS (XML Matching And Structuring) language, algebra, and first prototype ~ 1999 [SIGMOD99,EDBT00,...]
query composition and rewriting esp. with limited source capabilities
on-demand (“lazy”) query processing of virtual XML docs (DOM-VXD)
Blended Browsing and Querying user interface (BBQ)
4. New MIX Challenges from Scientific Applications Complex Data (S2S)
SDSC’s Scientific Data Applications (current/planned, e.g. Neurosciences: SciDAC/SDM, NCMIR, NIH BIRN, Earth sciences, ...) show that syntactic/structural integration is insufficient for ...
Complex Multiple-World Mediation Problems:
complex, disjoint, seemingly unrelated data
“hidden semantics” in complex, indirect relationships
=> Semantic (aka Model/Knowledge-Based) Mediation
lift mediation to the level of conceptual models (CMs)
use domain experts’ knowledge formalized as rules over CMs
=> Specialized Extensions
temporal, geospatial, statistical, DQ/accuracy... operations
=> Extend Mediation Scope and Power via Deductive Rules
5. A Neuroscience Question protloc = NCMIR, excel + images
morphometry (measurement) = NCMIR, excel + txt +images
neurotrans (stimulate then electrical responses, probes) = RDB, SENSELAB, Yale
CaBP (chemical structure, PDB links, function of CaBP, found-in...) = Web, Vanderbilt U
Expasy (Protein-info as Sequence data) = Web, Europeprotloc = NCMIR, excel + images
morphometry (measurement) = NCMIR, excel + txt +images
neurotrans (stimulate then electrical responses, probes) = RDB, SENSELAB, Yale
CaBP (chemical structure, PDB links, function of CaBP, found-in...) = Web, Vanderbilt U
Expasy (Protein-info as Sequence data) = Web, Europe
6. Example for Formalizing Domain Knowledge:Domain Map (Ontology) for SYNAPSE and NCMIR
7. Domain Map Refinement
8. Semantic Annotation Tool for Domain Scientists
9. Extended Mediator Architecture for Semantic Mediation
10. ANATOM Domain Map with Registered Data
11. Query Processing
12. Mediator System Architecture
13. Mediation Services:Source Registration (System Issues)
14. Mediation Services: Source Registration (Semantics Issues) Domain Map Registration
provide concept space/ontology
… as a private object (“myANATOM”)
… merge with others (give “semantic bridges”)
… and check for conflicts
Conceptual Model Registration
schema: classes, associations, attributes
domain constraints
“put data into context” (linking data to the domain map)
15. Mediation Services: Client Registration
16. Other Existing Infrastructure Transparent Access to Remote Data Collections: Storage Resource Broker (SRB) and Metadata Catalog (MCAT)
“Production-Level” Software
PPDG: interface to LBNL Storage Manager, collection creation, replication management
Use of manual and automatic wrapper technology (Minerva, Roadrunner, V. Crescenzi, Universita di Roma Tre)
=> XWrap Elite
17. SRB and the Particle Physics Data Grid
18. Year 1 Deliverables define interface metadata format (Critchlow)
extend XWrap to generate wrappers using the interface metadata description instead of requiring human interaction (GT)
develop a canonical XML-based query and response format as a dynamic interface between query engine and wrappers (Critchlow, GT, SDSC)
communication via agent protocols? How about using digital library infrastructure (e.g. Simple Digital Library Interoperability Protocol, SDLIP)
use extended XWrap to create wrappers for the genomics domain for evaluation (GT)
extend the SDSC query and metadata architecture to interoperate with the LLNL DataFoundry (SDSC, Critchlow)
... interoperation at the wrapper level: Minerva wrappers, XWrap
19. References Model-Based Mediation with Domain Maps, B. Ludäscher, A. Gupta, M. E. Martone, 17th Intl. Conference on Data Engineering (ICDE), Heidelberg, Germany, IEEE Computer Society, April 2001.
Model-Based Information Integration in a Neuroscience Mediator System, B. Ludäscher, A. Gupta, M. E. Martone, demonstration track, 26th Intl. Conference on Very Large Databases (VLDB), Cairo, Egypt, September 2000.
Knowledge-Based Integration of Neuroscience Data Sources, A. Gupta, B. Ludäscher, M. E. Martone, 12th Intl. Conference on Scientific and Statistical Database Management (SSDBM), Berlin, Germany, IEEE Computer Society, July 2000.