250 likes | 383 Views
Model-Based Information Integration in a Neuroscience Mediator System. Bertram Ludaescher Amarnath Gupta Maryann E. Martone University of California San Diego. WWW. DB. A Standard Mediator Architecture ( MIX -- M ediation of I nformation using X ML ). USER-Query. XML Q/A.
E N D
Model-Based Information Integration in a Neuroscience Mediator System Bertram Ludaescher Amarnath Gupta Maryann E. Martone University of California San Diego
WWW DB A Standard Mediator Architecture(MIX -- Mediation of Information using XML) USER-Query XML Q/A INTEGRATED VIEW MIX MEDIATOR XML Integrated View Definition XML Q/A XML Q/A Wrapper Wrapper Wrapper Files Lab1 Lab2 Lab3 Data Sources VLDB2000, Cairo
SEMANTIC Integration ??? • SYNTACTIC/STRUCTURAL Integration • Integrated Views (Src-XML => Intgr-XML) • Schema Integration (DTD =>DTD) • Wrapping, Data Extraction (Text => XML) MIX Mediation of Information using XML Distributed Query Processing SRB/MCAT storage, query capabilities protocols & services SYSTEM Integration TCP/IP HTTP CORBA Integration Issues VLDB2000, Cairo
Integration Issues: Mediating across Multiple-Worlds • Structural Integration => common semistructured data model (XML) => XML queries & transformations to resolve schema conflicts • Limited Query Capabilities => mediator is aware of QCs exported by wrappers • ... • Semantic Integration • most work deals with issues for “one-world” scenarios (e.g., amazon.com vs. bn.com) • what if data comes from a “multiple-world” scenario (like Neuroscience), where data objects from different sources are not even similar, and only the hidden semantics (known to the domain expert) provides the “semantic link”? VLDB2000, Cairo
A Neuroscience Question What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? ??? Integrated View ??? ??? Integrated View Definition ??? ???Mediator ??? Wrapper Wrapper Wrapper Wrapper Web protein localization morphometry neurotransmission CaBP, Expasy VLDB2000, Cairo
Purkinje Cell layer of Cerebellar Cortex Molecular layer of Cerebellar Cortex Fragment of dendrite Hidden Semantics: Protein Localization <protein_localization> <neuron type=“purkinje cell” /> <protein channel=“red”> <name>RyR</> …. </protein> <region h_grid_pos=“1” v_grid_pos=“A”> <density> <structure fraction=“0.8”> <name>spine</> <amount name=“RyR”>0</> </> <structure fraction=“0.2”> <name>branchlet</> <amount name=“RyR”>30</> </> VLDB2000, Cairo
Branch level beyond 4 is a branchlet Must be dendritic because Purkinje cells don’t have somatic spines Hidden Semantics: Morphometry <neuron name=“purkinje cell”> <branch level=“10”> <shaft> … </shaft> <spine number=“1”> <attachment x=“5.3” y=“-3.2” z=“8.7” /> <length>12.348</> <min_section>1.93</> <max_section>4.47</> <surface_area>9.884</> <volume>7.930</> <head> <width>4.47</> <length>1.79</> </head> </spine> … VLDB2000, Cairo
The Problem • Multiple Worlds Integration • compatible terms not directly joinable • complex, indirect associations among schema elements • unstated integrity constraints • Why not just use Ontologies? • typical ontologies associate terms along limited number of dimensions • What’s needed? • a “theory” under which non-identical terms can be “semantically joined” => lift mediation to the level of conceptual models (CMs) => domain knowledge, ICs become rules over CMs => Model-Based Mediation VLDB2000, Cairo
Integrated-DTD := XML-QL(Src1-DTD,...) Integrated-CM:= CM-QL(Src1-CM,...) DOMAIN MAP IF THEN IF THEN Logical Domain Constraints IF THEN No Domain Constraints Structural Constraints (DTDs), Parent, Child, Sibling, ... Classes, Relations, is-a, has-a, ... C1 A = (B*|C),D B = ... C2 R C3 . . .... .... .... XML Elements .... (XML) Objects Raw Data Raw Data ConceptualModels Raw Data XML-Based vs. Model-Based Mediation XML Models VLDB2000, Cairo
Extended Mediator Architecture => Wrappers export Conceptual Models (CMs), i.e., facts+rules for classes, relationships, ICs, ... ) => Mediator importsCMs (from sources, auxiliary knowledge bases, and domain maps (DMs) => a generic conceptual model (GCM, a subset of F-logic), extensible via rules = common target CM language => new CMs can be plugged-in by specifying them in GCM + F-logic rules => prototype implementation in FLORA: • global-as-view approach • compiler: F-logic => XSB-Prolog • top-down evaluation => virtual (demand-driven) views • external interfaces (XML, RDBs, DM visualization,...) VLDB2000, Cairo
FL rule proc. LP rule proc. GCM GCM GCM Mediator Engine CM S1 CM S3 CM S2 XSB Engine Graph proc. CM-Wrapper CM-Wrapper CM-Wrapper XML-Wrapper XML-Wrapper XML-Wrapper S3 S1 S2 Model-Based Mediator Architecture USER/Client CM (Integrated View) Domain Map DM Integrated View Definition IVD CM Plug-In CM Queries & Results (exchanged in XML) Logic API (capabilities) VLDB2000, Cairo
Definition of Integrated Views ... • XML-2-FL and CM-2-FL Translators <!ELEMENT Studies (Study)*> <!ELEMENT Study (study_id, … animal, experiments, experimenters> <!ELEMENT experiments (experiment)*> <!ELEMENT experiment (description, instrument, parameters)> studyDB[studies =>> study]. study[study_id=> string; … animal => animal; experiments=>> experiment; experimenters =>> string]. … • Specification of Domain Knowledge • Subclasses • Rules • Integrity Constraints • Integrated View Definition mushroom_spine :: spine S:mushroom_spine IF S:spine[head_; neck _]. ic1(S):alert[type “invalid spine”; object S]IF S:spine[undef ->> {head, neck}]. protein_distribution(Protein, Organism, Brain_region, Feature_name, Anatom, Value) IF I:protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->> {AS:anatomical_structure[name->Anatom]}], NAE:neuro_anatomic_entity[name->Anatom; loccated_in->>{Brain_region}], AS..segments..features[name->Feature_name; value->Value]. VLDB2000, Cairo
association rule taxon[subspecies string; species string; genus string; … phylum string; kingdom string; superkingdom string]. Schema ... Definition of Integrated Views (Multiple Sources) • Creating Mediated Classes • Reasoning with Schema animal[MR] IF S:source, S.animal [MR] . X[taxonT] IF X: ‘PROLAB’.animal[name N], words(N,[W1,W2|_]), T: ‘TAXON’.taxon[genus W1;species W2]. union over all classes At Mediator subspecies::species::genus:: … kingdom::superkingdom T:TR, TR::TR1 IF T: ‘TAXON’.taxon[Taxon_Rank TR, Taxon_Rank1 TR1], Taxon_Rank::Taxon_Rank1. Class creation by schema reasoning VLDB2000, Cairo
Model-Based Mediation with DOMAIN MAPS (DMs) • “Semantic Road Maps” for situating source data • => navigational aid (browsing source classes at the conceptual level) • => basis for integrated views across multiple worlds • => link points (concepts) and labeled arcs (roles) • => formal semantics (in FL and/or DLs) • Example: ANATOM DM • = antatomical entities (concepts) + is_a, has_a, overlaps, ... (roles) • => from syntactic equality to semantic joins LINK(X,Y): X.zip = Y.zip X.addr in Y.zip X.zip overlaps Y.county ... Integrated-CM(Z1,...) := get X1,... from Src1; get X2,... from Src2; LINK (Xi, Yj); Zj = CM-QL(X1,...,Y1,...) VLDB2000, Cairo
ANATOM ANATOM Domain Map VLDB2000, Cairo
ANATOM Domain Map with Registered Data ANATOM DATA VLDB2000, Cairo
Deductive Closure of “has_a” with “tc(is_a)”:(YES -- Real Recursive Views!! ;-) ANATOM CLOSURE VLDB2000, Cairo
Example Query Evaluation (I) • Example: protein_distribution • given:organism, protein, brain_region • ANATOM DM: • recursively traverse the has_a_star paths under brain_region collect all anatomical_entities • Source PROLAB: • join with anatomical structures and collect the value of attribute “image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = proteinand “study_db.study.animal.name” = organism • Mediator: • aggregate over all parents up to brain_region • report distribution VLDB2000, Cairo
Interactive Queries (I) KIND VLDB2000, Cairo
Example Query Evaluation (II) @SENSELAB: X1 := select output from parallel fiber; @MEDIATOR: X2 := “hang off” X1 from Domain Map; @MEDIATOR: X3 := subregion-closure(X2); @NCMIR: X4 := select PROT-data(X3, Ryanodine Receptors); @MEDIATOR: X5 := compute aggregate(X4); "How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?" VLDB2000, Cairo
KIND01 Interactive Queries (II) VLDB2000, Cairo
Resulting Sub DOMAIN MAP “Browser” PROTLOC VLDB2000, Cairo
Computed Protein Localization Data PROTLOC VLDB2000, Cairo
Client-Side Result Visualization(using AxioMap Viewer: Ilya Zaslavsky) PROTLOC-AxioMap VLDB2000, Cairo
Surface atlas, Van Essen Lab stereotaxic atlas LONI MODEL-BASED Mediation MCell, CNL, Salk CCB, Montana SU NCMIR, UCSD Summary & Outlook: Federation of Brain Data Result (XML/XSLT) PROTLOC Result (VML) ANATOM VLDB2000, Cairo