290 likes | 419 Views
eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Ispra, Italy January 17, 2006. Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 bebargmeyer@lbl.gov. XMDR Project Collaboration.
E N D
eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Ispra, Italy January 17, 2006 Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 bebargmeyer@lbl.gov
XMDR Project Collaboration • Collaborative, interagency effort • EPA, USGS, NCI, Mayo Clinic, DOD, LBNL …& others • Draws on and contributes to interagency/International Cooperation on Ecoinformatics • Involves Ecoterm, international, national, state, local government agencies, other organizations as content providers and potential users • Interacts with many organizations around the world through ISO/IEC standards committees
XMDR Project Results:Bootstrapping Semantic Computing • Design for next generation metadata registries—expressed as a standard • XMDR Prototype, open source software • Content loaded in prototype: millions of concepts, terms, and relations between concepts. • Demonstrations for healthcare and the environment
Metadata Registry Extensions • Register (and manage) any semantics that are useful for managing data. • E.g., this may include registering not only permissible values (concepts), definitions, but may extend to registration of the full concept systems in which the permissible values are found. • E.g., may want to register keywords, thesauri, taxonomies, ontologies, axiomatized ontologies…. • Support traditional data management and data administration • Lay Foundation for semantic computing: Semantics Service Oriented Architecture, Semantic Grids, Semantics based workflows, Semantic Web ….
Where have we been? Where are we planning to go? Semantics: System manuals Semantic grids Data dictionaries Semantics services (SSOA) 11179 E1 Data + ontology lifecycle management Data Standards XMDR Project 11179 E2 Complex semantics management Data Management/ Data Administration 11179 E3 Data engineering Semantic Web Terminologies, ontologies XML & related standards
Users CONCEPT Metadata Registry TerminologyThesaurus Themes Refers To Symbolizes Ontology GEMET “Rose”, “ClipArt” Stands For Referent Data Standards Structured Metadata XMDR Draws Together Terminology Metadata Registries 11179 Metadata Registry
Users Metadata Registry Concept SystemThesaurus Themes Ontology GEMET Data Standards Structured Metadata Concept System Store Concept systems: Keywords Controlled Vocabularies Thesauri Taxonomies Ontologies Axiomatized Ontologies (Essentially graphs: node-relation-node + axioms) } ISO/IEC 11179 Metadata Registry
Users Metadata Registry Concept SystemThesaurus Themes Ontology GEMET Data Standards Structured Metadata Management of Concept Systems Concept system: Registration Harmonization Standardization Acceptance (vetting) Mapping (correspondences) } ISO/IEC 11179 Metadata Registry
Users Metadata Registry Concept SystemThesaurus Themes Ontology GEMET Data Standards Structured Metadata Life Cycle Management } Life cycle management: Data and Concept systems (ontologies) ISO/IEC 11179 Metadata Registry
Users Metadata Registry Concept SystemThesaurus Themes Ontology GEMET Data Standards Structured Metadata Grounding Semantics Metadata Registries Semantic Web RDF Triples Subject (node URI) Verb (relation URI) Object (node URI) Ontologies ISO/IEC 11179 Metadata Registry
External Interface RegistryStore Registry Java WritableRegistryStore Subversion AuthenticationService RetrievalIndex MetadataValidator Jena, Xerces LogicBasedIndex FullTextIndex Jena, OWI KS Racer Lucene MappingEngine Ontology Editor 11179 OWL Ontology Protege Composition (tight ownership) Generalization Aggregation (loose ownership) XMDR Prototype Architecture: Initial Modules
Ontology Editor Protege XMDR Prototype Architecture: Initial Implemented Modules External Interface RegistryStore Registry Java WritableRegistryStore Subversion RetrievalIndex LogicBasedIndex FullTextIndex 11179 OWL Ontology Jena Racer,etc. Lucene Composition (tight ownership) Generalization Aggregation (loose ownership)
UML is Used for 11179 Metamodel,XMDR uses OWL, RDF & XML Schema 11179 Relational Schema Relational Metadata UML11179 Metamodel Types & Cardinalities OWL XMDR Ontology & annotations XMDR XML Schema TRang XMDR’s Relax NG Schema Triples: binary labeled relationships RDF Spec What things go in own files? Which property direction stored? Sequential ordering of properties XML Schema Language spec XML Objects
Refined XMDR Subclasses Improve Organization & Enable Inference
Concept System A XSLT script Harold Solbrig (Mayo Clinic) A Concepts Lexgrid Source A Original Source A A Relationships XMDR Example Content Loaded fromDiverse Sources via LexGrid & XSLT Content loaded to date: 2.7 million triples
XMDR Content List (partial) NBII Biocomplexity Thesaurus NCI Thesaurus National Cancer Institute Thesaurus NCI Data Elements (National Cancer Institute Data Standards Registry UMLS (non-proprietary portions) GEMET (General Multilingual Environmental Thesaurus) EDR Data Elements (Environmental Data Registry) USGS Geographic Names Information System (GNIS) HL7 Terminology, Data Elements Mouse Anatomy GO (Gene Ontology) EPA Web Registry Controlled Vocabulary BioPAX Ontology NASA SWEET Ontologies …
Earth Realms • Physical Phenomena • (any transient feature) • Physical Processes • Physical Properties • Physical Substances • Sun Realms • Biosphere Data • Data Centers • Human Activities • Material Things • Numerics • Sensors • Space • Time • Units NASA-JPL Semantic Web for Earth and Environmental Terminology • SWEET written in OWL ontology language (W3C) • Can view with Internet Explorer 5+, Netscape 7+, etc. • Can also use OWL-specific tools (e.g., SWOOP, Protégé) • Terms in other taxonomies can be mapped to SWEET using • Global Change Master Directory (GCMD) • CF Standard Names • http://sweet.jpl.nasa.gov/ontology/
Content Loaded from EPA EDR and NASA SWEET Ontology SWEET (OWL) EDR java XMDR files XMDR files (ontologies) concepts & relationships XMDR ontology
What happens to XMDR files before they can be used for text searching or inference? A Relationships A Concepts Concept System A B Relationships B Concepts Concept System B etc. … [all xmdr files] [each system (A,B,…etc) loaded individually] Jena Lucene Inference queries (Jena) Model A Model B XMDR Ontology…etc Lucene indexes Text queries (Lucene) Union of all models Search/Query results are sets of URLs for xmdr files pictured above
How to Search/Query Complex Concepts & Relationships New Proposed Objects Current 11179 Objects
XMDR RDF Graph Query Facilities Compliment Text Query Capabilities • SQL-like queries • e.g., names of ontologies in a registry • Span items that are only indirectly connected • e.g., data elements associated with a conceptual domain • Expand queries to subsumed classes in hierarchy • e.g., ConceptualDomain includes EnnumeratedConc.. • Transitivity • e.g., all subclasses subsumed by a higher order class • e.g., all superclasses (ancestors) of a particular class • Least common ancestor • e.g., closest subsuming concept for 2 concepts
Example Subclass Queries: (Inference with Transitivity) • Environmental: • What are all the (sub)types of Wetland (in SWEET)? RDQL: SELECT ?x WHERE (?x rdfs:subClassOf earthrealm:Wetland) USING earthrealm FOR <http://sweet.jpl.nasa.gov/ontology/earthrealm.owl#> • Health • Find all the types of "Lung Carcinoma"
More Complex “Sibling” Queries: Concepts with Multiple Ancestors • Health • Find all the siblings of Breast Neoplasm • Note: This is complex, since Breast Neoplasm has two parents - Neoplasm by Site and Breast Disorder -- You would get returned both the by site Neoplasms, such as Eye Neoplasm, Respiratory System Neoplasm, etc. and the Breast Disorder siblings such as Non-Neoplastic Breast Disorder
Least Common Ancestor Queries: (Inference with Transitivity) • Health: • "Morphine Sulfate" and "Acetaminophen". • least common ancestor should be Analgesic Agent (with multiple intervening concepts.)
Searching caDSR for Data Elements via Concepts and Vice-Versa • Common Data Elements (CDEs) are 'connected' to concepts through the Object Class and Property of the CDE. A query such as this should look for the CDE's Object Class derivation rule and select only those data elements associated with those object classes.. Alternatively, you could query the caDSR Concept Class and find all related OCs where the concept was flagged as "primary concept", then get all the Data Elements .. leveraging the ISO 11179 relationships...e.g. Object Class has related Data Element Concepts, DECs have related DEs... Concepts can also be associated with Value Meanings. So, search Concept Class with concept code, find all related Value Meanings, find all Value Domains that used the value meaning, find all Data Elements that used the Value domain.
Challenges and Future Goals for XMDR Prototype • Scalability & performance • Tools • RDF tool adaptation for metadata registries • User-friendly interface • Form interface for registration & uploading metadata • References to externally maintained sources • Data, ontologies, terminologies • Evaluate alternative technologies • For different modules • Demonstrate for key use cases and ecoinformatics applications
Challenges and Future Goals (cont) • Progress proposals through standards committees • Harmonization with W3C and OMG standards • Incorporate Common Logic, Web Services, etc. • Ontology Lifecycle Management (OLM) • Improve link of concepts to data • Generate schemas from axiomatized ontologies
Ecoinformatics Challenges • How does this fit into the research, development, and demonstration activities of the Interagency/International Cooperation on Ecoinformatics? • Should this be a part of the EU-US collaborative R&D?