1 / 29

eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Ispra, Italy January 17, 2006. Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 bebargmeyer@lbl.gov. XMDR Project Collaboration.

yamka
Download Presentation

eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Ispra, Italy January 17, 2006 Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 bebargmeyer@lbl.gov

  2. XMDR Project Collaboration • Collaborative, interagency effort • EPA, USGS, NCI, Mayo Clinic, DOD, LBNL …& others • Draws on and contributes to interagency/International Cooperation on Ecoinformatics • Involves Ecoterm, international, national, state, local government agencies, other organizations as content providers and potential users • Interacts with many organizations around the world through ISO/IEC standards committees

  3. XMDR Project Results:Bootstrapping Semantic Computing • Design for next generation metadata registries—expressed as a standard • XMDR Prototype, open source software • Content loaded in prototype: millions of concepts, terms, and relations between concepts. • Demonstrations for healthcare and the environment

  4. Metadata Registry Extensions • Register (and manage) any semantics that are useful for managing data. • E.g., this may include registering not only permissible values (concepts), definitions, but may extend to registration of the full concept systems in which the permissible values are found. • E.g., may want to register keywords, thesauri, taxonomies, ontologies, axiomatized ontologies…. • Support traditional data management and data administration • Lay Foundation for semantic computing: Semantics Service Oriented Architecture, Semantic Grids, Semantics based workflows, Semantic Web ….

  5. Where have we been? Where are we planning to go? Semantics: System manuals Semantic grids Data dictionaries Semantics services (SSOA) 11179 E1 Data + ontology lifecycle management Data Standards XMDR Project 11179 E2 Complex semantics management Data Management/ Data Administration 11179 E3 Data engineering Semantic Web Terminologies, ontologies XML & related standards

  6. Users CONCEPT Metadata Registry TerminologyThesaurus Themes Refers To Symbolizes Ontology GEMET “Rose”, “ClipArt” Stands For Referent Data Standards Structured Metadata XMDR Draws Together Terminology Metadata Registries 11179 Metadata Registry

  7. Users Metadata Registry Concept SystemThesaurus Themes Ontology GEMET Data Standards Structured Metadata Concept System Store Concept systems: Keywords Controlled Vocabularies Thesauri Taxonomies Ontologies Axiomatized Ontologies (Essentially graphs: node-relation-node + axioms) } ISO/IEC 11179 Metadata Registry

  8. Users Metadata Registry Concept SystemThesaurus Themes Ontology GEMET Data Standards Structured Metadata Management of Concept Systems Concept system: Registration Harmonization Standardization Acceptance (vetting) Mapping (correspondences) } ISO/IEC 11179 Metadata Registry

  9. Users Metadata Registry Concept SystemThesaurus Themes Ontology GEMET Data Standards Structured Metadata Life Cycle Management } Life cycle management: Data and Concept systems (ontologies) ISO/IEC 11179 Metadata Registry

  10. Users Metadata Registry Concept SystemThesaurus Themes Ontology GEMET Data Standards Structured Metadata Grounding Semantics Metadata Registries Semantic Web RDF Triples Subject (node URI) Verb (relation URI) Object (node URI) Ontologies ISO/IEC 11179 Metadata Registry

  11. External Interface RegistryStore Registry Java WritableRegistryStore Subversion AuthenticationService RetrievalIndex MetadataValidator Jena, Xerces LogicBasedIndex FullTextIndex Jena, OWI KS Racer Lucene MappingEngine Ontology Editor 11179 OWL Ontology Protege Composition (tight ownership) Generalization Aggregation (loose ownership) XMDR Prototype Architecture: Initial Modules

  12. Ontology Editor Protege XMDR Prototype Architecture: Initial Implemented Modules External Interface RegistryStore Registry Java WritableRegistryStore Subversion RetrievalIndex LogicBasedIndex FullTextIndex 11179 OWL Ontology Jena Racer,etc. Lucene Composition (tight ownership) Generalization Aggregation (loose ownership)

  13. UML is Used for 11179 Metamodel,XMDR uses OWL, RDF & XML Schema 11179 Relational Schema Relational Metadata UML11179 Metamodel Types & Cardinalities OWL XMDR Ontology & annotations XMDR XML Schema TRang XMDR’s Relax NG Schema Triples: binary labeled relationships RDF Spec What things go in own files? Which property direction stored? Sequential ordering of properties XML Schema Language spec XML Objects

  14. Refined XMDR Subclasses Improve Organization & Enable Inference

  15. Concept System A XSLT script Harold Solbrig (Mayo Clinic) A Concepts Lexgrid Source A Original Source A A Relationships XMDR Example Content Loaded fromDiverse Sources via LexGrid & XSLT Content loaded to date: 2.7 million triples

  16. XMDR Content List (partial) NBII Biocomplexity Thesaurus NCI Thesaurus National Cancer Institute Thesaurus NCI Data Elements (National Cancer Institute Data Standards Registry UMLS (non-proprietary portions) GEMET (General Multilingual Environmental Thesaurus) EDR Data Elements (Environmental Data Registry) USGS Geographic Names Information System (GNIS) HL7 Terminology, Data Elements Mouse Anatomy GO (Gene Ontology) EPA Web Registry Controlled Vocabulary BioPAX Ontology NASA SWEET Ontologies …

  17. Earth Realms • Physical Phenomena • (any transient feature) • Physical Processes • Physical Properties • Physical Substances • Sun Realms • Biosphere Data • Data Centers • Human Activities • Material Things • Numerics • Sensors • Space • Time • Units NASA-JPL Semantic Web for Earth and Environmental Terminology • SWEET written in OWL ontology language (W3C) • Can view with Internet Explorer 5+, Netscape 7+, etc. • Can also use OWL-specific tools (e.g., SWOOP, Protégé) • Terms in other taxonomies can be mapped to SWEET using • Global Change Master Directory (GCMD) • CF Standard Names • http://sweet.jpl.nasa.gov/ontology/

  18. Content Loaded from EPA EDR and NASA SWEET Ontology SWEET (OWL) EDR java XMDR files XMDR files (ontologies) concepts & relationships XMDR ontology

  19. What happens to XMDR files before they can be used for text searching or inference? A Relationships A Concepts Concept System A B Relationships B Concepts Concept System B etc. … [all xmdr files] [each system (A,B,…etc) loaded individually] Jena Lucene Inference queries (Jena) Model A Model B XMDR Ontology…etc Lucene indexes Text queries (Lucene) Union of all models Search/Query results are sets of URLs for xmdr files pictured above

  20. How to Search/Query Complex Concepts & Relationships New Proposed Objects Current 11179 Objects

  21. XMDR RDF Graph Query Facilities Compliment Text Query Capabilities • SQL-like queries • e.g., names of ontologies in a registry • Span items that are only indirectly connected • e.g., data elements associated with a conceptual domain • Expand queries to subsumed classes in hierarchy • e.g., ConceptualDomain includes EnnumeratedConc.. • Transitivity • e.g., all subclasses subsumed by a higher order class • e.g., all superclasses (ancestors) of a particular class • Least common ancestor • e.g., closest subsuming concept for 2 concepts

  22. Example Subclass Queries: (Inference with Transitivity) • Environmental: • What are all the (sub)types of Wetland (in SWEET)? RDQL: SELECT ?x WHERE (?x rdfs:subClassOf earthrealm:Wetland) USING earthrealm FOR <http://sweet.jpl.nasa.gov/ontology/earthrealm.owl#> • Health • Find all the types of "Lung Carcinoma"

  23. More Complex “Sibling” Queries: Concepts with Multiple Ancestors • Health • Find all the siblings of Breast Neoplasm • Note: This is complex, since Breast Neoplasm has two parents - Neoplasm by Site and Breast Disorder -- You would get returned both the by site Neoplasms, such as Eye Neoplasm, Respiratory System Neoplasm, etc. and the Breast Disorder siblings such as Non-Neoplastic Breast Disorder

  24. Least Common Ancestor Queries: (Inference with Transitivity) • Health: • "Morphine Sulfate" and "Acetaminophen". • least common ancestor should be Analgesic Agent (with multiple intervening concepts.)

  25. Searching caDSR for Data Elements via Concepts and Vice-Versa • Common Data Elements (CDEs) are 'connected' to concepts through the Object Class and Property of the CDE. A query such as this should look for the CDE's Object Class derivation rule and select only those data elements associated with those object classes.. Alternatively, you could query the caDSR Concept Class and find all related OCs where the concept was flagged as "primary concept", then get all the Data Elements .. leveraging the ISO 11179 relationships...e.g. Object Class has related Data Element Concepts, DECs have related DEs... Concepts can also be associated with Value Meanings. So, search Concept Class with concept code, find all related Value Meanings, find all Value Domains that used the value meaning, find all Data Elements that used the Value domain.

  26. Comparison of Different Reasoners (on 2.7m triples)

  27. Challenges and Future Goals for XMDR Prototype • Scalability & performance • Tools • RDF tool adaptation for metadata registries • User-friendly interface • Form interface for registration & uploading metadata • References to externally maintained sources • Data, ontologies, terminologies • Evaluate alternative technologies • For different modules • Demonstrate for key use cases and ecoinformatics applications

  28. Challenges and Future Goals (cont) • Progress proposals through standards committees • Harmonization with W3C and OMG standards • Incorporate Common Logic, Web Services, etc. • Ontology Lifecycle Management (OLM) • Improve link of concepts to data • Generate schemas from axiomatized ontologies

  29. Ecoinformatics Challenges • How does this fit into the research, development, and demonstration activities of the Interagency/International Cooperation on Ecoinformatics? • Should this be a part of the EU-US collaborative R&D?

More Related