290 likes | 496 Views
Frank Olken & Kevin D. Keck {olken, kdkeck}@lbl.gov Lawrence Berkeley National Laboratory Presentation to Open Metadata Forum Kobe, Japan March 21, 2006. XMDR Project Overview. Extended Metadata Registry. XMDR means:. The Cast. Bruce Bargmeyer (LBNL) = Principal Investigator
E N D
Frank Olken & Kevin D. Keck {olken,kdkeck}@lbl.gov Lawrence Berkeley National Laboratory Presentation to Open Metadata Forum Kobe, Japan March 21, 2006 XMDR Project Overview 9th Open Forum on Metadata Registries, Kobe, Japan
Extended Metadata Registry XMDR means: 9th Open Forum on Metadata Registries, Kobe, Japan
9th Open Forum on Metadata Registries, Kobe, Japan The Cast • Bruce Bargmeyer (LBNL) = Principal Investigator • Kevin Keck (LBNL) = architect & stds. (design) • Frank Olken (LBNL) = content characterization & stds. (design) • John McCarthy (LBNL) = prototype development (management) • Karlo Berket (LBNL) = prototype development • Harold Solbrig (Mayo) = content preprocessing via LexGrid, stds • Gayle Hodge (USGS) = content characterization, acquisition • Denise Warzel (NCI) = content acquisition, standards, design • Larry Fitzwater (EPA) = program mgt. (vision, direction) • Nancy Lawler (DOD) = program mgt. (vision, direction) • Sam Chance (DOD) = program mgt. (vision, direction)
9th Open Forum on Metadata Registries, Kobe, Japan Organizational Cast • Lawrence Berkeley National Laboratory • Environmental Protection Agency • National Cancer Institute • Mayo Clinic • United States Geological Survey • Department of Defense
9th Open Forum on Metadata Registries, Kobe, Japan Goals • Assist revisions of ISO/IEC 11179 Metadata Registry Standard to encompass additional semantic descriptions and resources • Vocabularies, thesauri, etc. • Ontologies • Relationships • Semantic types • Design and implement prototype Extended Metadata Registry • Load metadata content into prototype • Demonstrate prototype
9th Open Forum on Metadata Registries, Kobe, Japan Why Metadata Registries? • Facilitate reuse/standardization/integration/exchange of data • Design time: • Database / messaging / application / forms designers • Data warehouse design • Run-time: • Query formulation / optimization • Federated data query optimization / processing • Extraction, Translation, Load (ETL) of Data Warehouses • Semantic services, composition, workflows, ... • Users • Finding, understanding data • Understanding data entry forms
9th Open Forum on Metadata Registries, Kobe, Japan Why Standards? • Developing metamodel to serve as design for next generation metadata registries • Evolve ISO/IEC 11179 Metadata Data Registry Standard • Edition 2 (current) • UML modeling, relational DB technology implementation • Edition 3 (new) • UML + OWL (Ontology Web Language) / MOF (Meta Object Facility) / CL (Common Logic) modeling • Add support for ontologies
9th Open Forum on Metadata Registries, Kobe, Japan More on Why MDR Standards? • MDR Standards • Can improve metadata creation practice • Can improve metadata and data reuse • Facilitate MDR adoption by organizations • Facilitate MDR interoperability • Facilitate MDR software marketing • Facilitate MDR procurement • Facilitate alignment / mapping among metadata schemas, ...
9th Open Forum on Metadata Registries, Kobe, Japan Proposed Changes to ISO/IEC 11179 • Support for ontologies, etc. • More formal modeling of relationships • Semantic types (?)
9th Open Forum on Metadata Registries, Kobe, Japan Changes to ISO/IEC 11179 Std. • Add support for ontologies, vocabularies • Add ontologies • Add predicates (logical formulae) • Add axioms (asserted to be true) • Add support for modularization of ontologies • Add inclusion mechanisms for concept systems and ontologies • Assert axioms in context of containing ontology
9th Open Forum on Metadata Registries, Kobe, Japan Why add support for ontologies? • More precise specification of data semantics (than natural language definitions) • Machine processing of semantic specifications of data • Classification, subsumption testing, alignment, spatial, temporal reasoning • Reusable semantic specifications for subject domains • Conceptual data models to facilitate data integration • Encoding of much current work on data semantics and terminologies as ontologies • Useful for machine learning.
9th Open Forum on Metadata Registries, Kobe, Japan Issues in Including Ontologies in ISO/IEC 11179 • Lack of agreement on logical formalisms • FOL, description logic (which?), ... • Hence, MDR std must be agnostic among logic formalisms • Poses difficulties for: • Standards specification • MDR implementation • MDR interoperability • See work of OMG Ontology Definition Metamodel (ODM) standard
9th Open Forum on Metadata Registries, Kobe, Japan Changes to ISO/IEC 11179 Std. • Formalize specification of semantic relationships • Refinement of Edition 2 Classification Schemes • Add relationships (types), roles, links (instances) among concepts • Specify attributes of relationships • Reflexivity, irreflexivity, symmetry, anti-symmetry, transitivity • To support inference across semantic relationships • e.g., transitive closure over is-a, part-of, ...
9th Open Forum on Metadata Registries, Kobe, Japan Relationship Modeling in ISO/IEC 11179 Edition 3 • Edition 2 has classification schemes and specialized relationships among various metamodel entities • Proposed for Edition 3 • Binary and N-ary semantic relationships among concepts (a.k.a. relations) • Treat data element concept, conceptual value domain, value meaning, etc. as subtypes of concept • More detailed characterization of relationships: • Roles / links • Reflexivity, symmetry, anti-symmetry, transitivity, ....
9th Open Forum on Metadata Registries, Kobe, Japan Why care about relationship characterization? • Who cares about reflexivity, irreflexivity, symmetry, transitivity? • Answer: need this information for inference on semantic relationships (usually binary) • Example: Does it make sense to compute transitive closure? • Is-a: transitive • Part-of: sometimes transitive • Equals: transitive, symmetric • Similar: usually symmetric, typically not transitive
9th Open Forum on Metadata Registries, Kobe, Japan Semantic Types for ISO/IEC 11179 • ISO/IEC 11179 Edition 2 has “datatypes” • Associated with “value domain” • i.e., datatypes are an aspect of representation NOT semantics • Semantic Types • Concern meaning rather than representation • Uses: • Constraints over relationship roles • Attribute of concepts, conceptual value domains, ... • Ubiquitous in ontologies, schemas, ...
9th Open Forum on Metadata Registries, Kobe, Japan Some Issues for Semantic Types • Alternative approaches: • Build semantic types into 11179 metamodel • Reuse relationships for semantic type specifications • Treat semantic types as unary predicates in ontologies + axioms • Should we have a standard set of semantic types (at least base types) • Yes, for interoperability • No, for flexibility • Collection types, type constructors ?
9th Open Forum on Metadata Registries, Kobe, Japan Why Construct A Prototype? • To explore alternative revisions to ISO/IEC 11179 • To demonstrate that proposed revisions to ISO/IEC 11179 Metadata Registry Std. are: • Feasible • Useful • To experiment with alternative architectures / technologies for constructing extended metadata registries. • Text retrieval engines - Lucene • Inference engines – Jena, Kowari (?), .... • Service oriented architecture (SOA) • To facilitate deployment of revised ISO/IEC Metadata Registries • Example implementation • Open Source Code !
9th Open Forum on Metadata Registries, Kobe, Japan Why Content? • Content characterization assists in shaping revisions to ISO/IEC 11179 • Content characterization assists in selection of content to load • Content ingestion, installation, querying provides a means to exercise the prototype • Testing • Demonstration • Performance evaluation • Utility evaluation
9th Open Forum on Metadata Registries, Kobe, Japan Metadata Content Activities • Content Characterization • e.g., graph theoretic characterization • Content Acquisition • Content Preprocessing • Into standard formats for loading (H. Solbrig) • Content Loading • Content Querying
9th Open Forum on Metadata Registries, Kobe, Japan Desiderata for Content Selection • Accessibility • Licensing, source cooperation, unclassified • Documentation, familiarity to XMDR collaborators • Funder interest • Diversity of metadata types, subject areas • Diverse graph structures (of semantic relationships) • OWL encodings available • Moderate size • Opportunities for mappings among metadata sets • Multi-linguality
9th Open Forum on Metadata Registries, Kobe, Japan Content Characterization • Provenance: Name, source, contact, ... • Type of metadata: • thesauri, ontology, ISO/IEC 11179 metadata registry, ... • Graph Characterization • Tree, Faceted Classification, partial order (directed acyclic graph), cyclic graph, ... • Size: # concepts, # links, # bytes • Definitions ? • File Formats • OWL encoding ? • Multilingual • Availability / licensing issues
9th Open Forum on Metadata Registries, Kobe, Japan Why Graph-theoretic Content Characterization? • Important structural taxonomy • Impacts: • Expressivity required of registry • Content representation, index structures • Search, matching algorithms • Computational complexity of search, matching, ... • Inference algorithms • Computational complexity of inference • Design / implementation / performance of metadata registries
9th Open Forum on Metadata Registries, Kobe, Japan Loaded content metadatasets • National Cancer Institute Thesaurus (NCIT) • Defense Technology Information Center (DTIC) Thesaurus • General Multilingual Environmental Thesaurus (GEMET) • Adult Mouse Anatomical Dictionary • EPA Terms of the Environment • ISO 3166 Country Codes • ISO 4217 Currency Codes
9th Open Forum on Metadata Registries, Kobe, Japan Other Metadatasets of Interest • NCI Cancer Data Standards Repository (caDSR) • EPA Environmental Data Registry (EDR) • NLM Uniform Medical Language System (UMLS) • USGS Geographic Names Information System (GNIS) • Integrated Taxonomic Information System (ITIS) • NBII Biocomplexity Thesaurus • ISO 639 Language Identifiers • Logical Observations, Identifiers, Codes (LOINC) • Getty Thesaurus of Geographical Names (TGN) • NASA Semantic Web Earth and Environmental Terminologies (SWEET) • Dublin Core Metadata (?)
9th Open Forum on Metadata Registries, Kobe, Japan Conclusions • XMDR Activities • ISO/IEC 11179 Revisions • Support for ontologies, etc. • Relationships • Semantic types • Prototype Development • Content (characterization, loading, query) • Prototype testing, performance evaluation, demos
9th Open Forum on Metadata Registries, Kobe, Japan Coming in Second Part of Talk (Kevin Keck) : • Detailed discussion of the architecture and technology of the prototype ...
9th Open Forum on Metadata Registries, Kobe, Japan Acknowledgements • Financial support from U.S. Dept. of Defense, U.S. Environmental Protection Agency • In kind contributions from U.S. National Cancer Institute, Mayo Clinic, US Geological Survey • Support from program managers: Nancy Lawler (DOD) and Sam Chance (DOD) • Comments on drafts of this talk by John L. McCarthy
9th Open Forum on Metadata Registries, Kobe, Japan Contact Information: • Project: • http://xmdr.org/ • Frank Olken: • Lawrence Berkeley National Laboratory • Email: olken@lbl.gov • Tel: 510-486-5891 • URL: http://www.lbl.gov/~olken