300 likes | 452 Views
Model-Based Mediation with Domain Maps . Bertram Ludäscher * Amarnath Gupta * Maryann E. Martone +. * San Diego Supercomputer Center (SDSC) + National Center for Microscopy and Imaging Research (NCMIR) University of California, San Diego (UCSD). Overview. Motivation
E N D
Model-Based Mediation with Domain Maps Bertram Ludäscher* Amarnath Gupta* Maryann E. Martone+ *San Diego Supercomputer Center (SDSC) +National Center for Microscopy and Imaging Research (NCMIR) University of California, San Diego (UCSD)
Overview • Motivation • Problem with current Mediator Architecture • Complex Scientific Multiple-World Scenarios • Model-Based Mediation Architecture • Lifting from XML to level of Conceptual Models (CMs) • Formal Framework • Domain Maps (DMs) • Generic Conceptual Model GCM • Integrated View Definition • Example Query Evaluation • Open Issues
USER-Query XML Q/A INTEGRATED VIEW XML Integrated View Definition XMAS/XQuery MIX MEDIATOR XML Q/A XML Q/A WWW Wrapper Wrapper Wrapper DB Files Lab1 Lab2 Lab3 Data Sources A Standard Mediator Architecture(MIX -- Mediation of Information using XML, SDSC/UCSD)
The Problem: Complex Multiple-World Scenarios • Current Integration Issues • Structural/Schema Conflicts • common semistructured data model (XML) • schema transformations/integration (XML queries & transforms) • Limited Query Capabilities • capability based rewriting (e.g., TSIMMIS) • ... • BUT scenarios are “one-world” (amazon.com vs. bn.com) or simple multiple world (home buyer) • Problem: No Support for Semantic Mediation • “complex multiple-world” scenarios (Neuroscience, Geoscience): • complex, disjoint, seemingly unrelated data • “hidden semantics” in complex, indirect relationships
??? Integrated View ??? ??? Integrated View Definition ??? ???Mediator ??? Wrapper Wrapper Wrapper protein localization (NCMIR) morphometry (SYNAPSE) neurotransmission (SENSELAB) A Neuroscience Question What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents?
Purkinje Cell layer of Cerebellar Cortex Molecular layer of Cerebellar Cortex Fragment of dendrite Hidden Semantics: Protein Localization (NCMIR) <protein_localization> <neuron type=“purkinje cell” /> <protein channel=“red”> <name>RyR</> …. </protein> <region h_grid_pos=“1” v_grid_pos=“A”> <density> <structure fraction=“0.8”> <name>spine</> <amount name=“RyR”>0</> </> <structure fraction=“0.2”> <name>branchlet</> <amount name=“RyR”>30</> </>
Branch level beyond 4 is a branchlet Must be dendritic because Purkinje cells don’t have somatic spines Hidden Semantics: Morphometry (SNYAPSE) <neuron name=“purkinje cell”> <branch level=“10”> <shaft> … </shaft> <spine number=“1”> <attachment x=“5.3” y=“-3.2” z=“8.7” /> <length>12.348</> <min_section>1.93</> <max_section>4.47</> <surface_area>9.884</> <volume>7.930</> <head> <width>4.47</> <length>1.79</> </head> </spine> …
Approach: Model-Based Mediation • Complex Multiple Worlds Integration Problem • terms not directly joinable • complex, indirect associations • unstated, “hidden” semantics (not just schema conflicts) • Missing “Semantic Link” => how to define complex, indirect semantic links? => lift mediation to the level of conceptual models (CMs) => domain expert’s knowledge formalized as rules over CMs => Model-Based Mediation
Integrated-DTD := XQuery(Src1-DTD,...) Integrated-CM:= CM-QL(Src1-CM,...) DOMAIN MAP IF THEN IF THEN Logical Domain Constraints IF THEN No Domain Constraints Structural Constraints (DTDs), Parent, Child, Sibling, ... Classes, Relations, is-a, has-a, ... C1 A = (B*|C),D B = ... C2 R C3 . . .... .... .... XML Elements .... (XML) Objects XML Models Raw Data Raw Data ConceptualModels Raw Data XML-Based vs. Model-Based Mediation
Extended Mediator Architecture • Wrappers export Conceptual Models (CMs) • facts & rules for classes, relationships, ICs, ... • source data is “put into context” (“aboutness” index) by linking to domain maps (DMs) • Mediator employs CMs and DMs • ... to define complex semantic relationships on the formalized domain knowledge • Generic Conceptual Model (GCM) • as a common target CM • minimal requirements/core expressions: • instance(O,C), subclass(C1,C2) • method_type(C,M,C’), method_value(O,M,R) • relation_type(R,A1/C1,...,An/Cn) • relation_value(R,a1,...,an) • Expressiveness, Extensibility • allow inductive properties (inheritance, closures, ...) • employ a declarative rule language (e.g. F-Logic)
FL rule proc. LP rule proc. GCM GCM GCM Mediator Engine CM S1 CM S2 CM S3 XSB Engine Graph proc. CM-Wrapper CM-Wrapper CM-Wrapper XML-Wrapper XML-Wrapper XML-Wrapper S3 S1 S2 Model-Based Mediator Architecture USER/Client CM (Integrated View) Domain Map DM Integrated View Definition IVD CM Plug-In CM Queries & Results (exchanged in XML) Logic API (capabilities)
Purkinje cells and Pyramidal cells have dendrites that have higher-order branches that contain spines. Dendritic spines are ion (calcium) regulating components. Spines have ion binding proteins. Neurotransmission involves ionic activity (release). Ion-binding proteins control ion activity (propagation) in a cell. Ion-regulating components of cells affect ionic activity (release). domain expert knowledge domain map equivalent Description Logic facts Formalizing Domain Knowledge:Domain Map for SYNAPSE and NCMIR • A domain map comprises • Description Logic facts ... • - concepts ("classes") • - roles ("associations") • derived properties ... • ... expressed as logic rules • - (e.g. F-logic)
In addition to registering (“hanging off”) data, a source may also refine the mediator’s domain map... Domain Map Refinement ... source can register new concepts at the mediator ...
Definition of Integrated Views (Deja Vu?) ... • XML/CM-2-FL Translators <!ELEMENT Studies (Study)*> <!ELEMENT Study (study_id, … animal, experiments, experimenters> <!ELEMENT experiments (experiment)*> <!ELEMENT experiment (description, instrument, parameters)> studyDB[studies =>> study]. study[study_id=> string; … animal => animal; experiments=>> experiment; experimenters =>> string]. … • Specification of Domain Knowledge • Subclasses • Data Classification • Integrity Constraints mushroom_spine :: spine DERIVE S:mushroom_spine FROM S:spine[head_; neck _]. ic1(S):ALERT[type “invalid spine”; object S] IF S:spine[undef ->> {head, neck}].
... Definition of Integrated Views (Multiple Sources) • Integrated View Definition • Schema Reasoning & Dynamic Classes DERIVE protein_distribution(Protein, Organism,Brain_region,Feature_name,Anatom,Value) FROM I:protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->> {AS:anatomical_structure[name->Anatom]}] , % from PROLAB AS..segments..features[name->Feature_name; value->Value], NAE:neuro_anatomic_entity[name-> Anatom; % from ANATOM located_in->>{Brain_region}]. taxon[subspecies string; species string; genus string; … phylum string; kingdom string; superkingdom string]. TAXON DB Schema TAXON Rank Hierarchy subspecies::species::genus:: … kingdom::superkingdom DERIVE T:TR, TR::TR1 FROM T: ‘TAXON’.taxon[Taxon_Rank TR, Taxon_Rank1 TR1], Taxon_Rank::Taxon_Rank1. Create Classes from TAXON data
Query Evaluation Example push selection @SENSELAB: X1 := select output from parallel fiber; determine source context @MEDIATOR: X2 := “hang off” X1 from Domain Map; compute region of interest (here: downward closure) @MEDIATOR: X3 := subregion-closure(X2); push selection @NCMIR: X4 := select PROT-data(X3, Ryanodine Receptors); compute protein distribution @MEDIATOR: X5 := compute aggregate(X4); "How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?"
ANATOM Domain Map with Registered Data ANATOM DATA
Deductive Closure of “has_a” with “tc(is_a)”:(YES -- Real Recursive Views!! ;-) ANATOM CLOSURE
Interactive Queries KIND01
Client-Side Result Visualization(using AxioMap Viewer: Ilya Zaslavsky) PROTLOC-AxioMap
Conclusions and Outlook • Model-based Mediation Architecture • for complex multiple worlds scenarios (Neuroscience, ...) • sources export CMs (data “lifted” to conceptual level) • mediator employs DMs (“semantic road map”) • Simple Prototype based on XSB/FLORA • source and result data situated in DM context • domain scientists are excited ... • Some Open Issues • striking the right balance between complexity and expressiveness of DMs (e.g. subsumption and satisfiability of DMs should be decidable) • query processing/optimization • modeling query capabilities • semantic annotation tools for “dumb” sources • re-implement ... *sigh* ... • ...
ANATOM Domain Map ANATOM
Model-Based Mediation with DOMAIN MAPS (DMs) • “Semantic Road Maps” for situating source data • => navigational aid (browsing source classes at the conceptual level) • => basis for integrated views across multiple worlds • => link points (concepts) and labeled arcs (roles) • => formal semantics (in FL and/or DLs) • Example: ANATOM DM • = antatomical entities (concepts) + is_a, has_a, overlaps, ... (roles) • => from syntactic equality to semantic joins LINK(X,Y): X.zip = Y.zip X.addr in Y.zip X.zip overlaps Y.county ... Integrated-CM(Z1,...) := get X1,... from Src1; get X2,... from Src2; LINK (Xi, Yj); Zj = CM-QL(X1,...,Y1,...)
Example Query Evaluation (I) • Example: protein_distribution • given:organism, protein, brain_region • ANATOM DM: • recursively traverse the has_a_star paths under brain_region collect all anatomical_entities • Source PROLAB: • join with anatomical structures and collect the value of attribute “image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = proteinand “study_db.study.animal.name” = organism • Mediator: • aggregate over all parents up to brain_region • report distribution
Interactive Queries KIND
Surface atlas, Van Essen Lab stereotaxic atlas LONI MODEL-BASED Mediation MCell, CNL, Salk CCB, Montana SU NCMIR, UCSD Summary & Outlook: Federation of Brain Data Result (XML/XSLT) PROTLOC Result (VML) ANATOM