160 likes | 170 Views
Exploring the need for rich metadata semantics for both human and computer understanding, along with the complicated process of mapping metadata schemas to ontologies. The presentation discusses semantic approaches for metadata schemas and the benefits of adding formal semantics to enable discovery, queries, mediation/linking, and reasoning. Examples of semantic annotation and ontological schemas are provided, along with opportunities and challenges in utilizing metadata semantics.
E N D
Metadata and Semantics Research Conference, since 2005 RDA Metadata Semantics Rich Metadata Semantics needed for human AND computer understanding but Mapping metadata schemas to ontologies can be a complicated procedure.... Gary Berg-Cross SOCoP, RDA US Advisory Committee
Outline of Topics • Metadata- many Standards and some Ambitious MD Requirements • RDA Metadata-Semantic Discussions & Background • Rich Metadata Semantics needed for human AND computer understanding • Semantic approaches needed for MD schemas • Adding formal semantics to metadata schemas for discovery, and queries, mediation/linking and reasoning use an be a complicated procedure.... • Illustrating 2 Semantic approaches • Semantic Annotation • Example of an Ontological Schema • Are we ready for metadata semantics to be widely used? • Where are the opportunities? • Can we agree on common or domain principles (like modularity or building blocks) or some formal semantic requirements?
Recap on (Richer) Metadata Type Structure (includes Linked Data) Different types or degrees of semantics may be appropriate for different tasks LOD needs semantics for context... CERIF provides a “much richer metadata than the standards used commonly with LOD and so improves greatly the experience of the end user (or the advantages of providing metadata.)” From: The potential of metadata for linked open data and its value for users and publishers by AnnekeZuiderwijk, Keith Jeffery, Marijn Janssen
Metadata & Standards Evolution from file system names/types & Describing DB Fields to MD Schemas for Exchange Dublin Core attaching categorical tags and descriptions via a MD schema Attempt to make data more human understandable – capture agreed upon MD that affords understanding The MD effort now requires many interacting pieces including Metadata Application Profiles and Workflow like entities
Strategy of “Modular” Theory of General and Domain Specific MD (and Ontologies) Trans-Domain (General Consensus) Metadata ID, time.... ISO MD_Keywords: Discipline, Place, Stratum, Temporal, Theme? “Harmonized” And Packaged Together Support Interoperabiity Independent?? Standardized Geo-specific metadata Standardized BioMed-specific metadata Standardized EarthScience-specific metadata Modules should be easier to create, validate, understand and maintain They may be substituted for and used and reused for composition
There are specific “standards” in domains General MD • [ISO 19115:2003] Geographic information -- Metadata • [ISO 19115-2:2009] Geographic information -- Metadata -- Part 2: Extensions for imagery and gridded data Other MD OGC Object Types axis axisDirection datum dataType derivedCRSType documentType ellipsoid featureType group Meaning.... • In OGC’s O&M model Earth Observations generate “products” that have metadata. • These are organized into a metadata profile organized as a schema Support bridging heterogeneity To achieve interoperability Support data integration.
Some Metadata Challenges (Earth Science from IlyaZaslavsky, CINERGI* pipeline) Common deficiencies in existing metadata descriptions: • Different metadata models and profiles, • Different details of requirements mandatory and optional fields (Dublin Core vs ISO) • Different meaning of fields and initial purpose/emphasis of data collection • Different local interpretations of how these fields should be filled out (eg “authors” and “contacts” are often mixed up). • Different classifications of resource types • Common resource types are: Organization, Webpage, Collection, Dataset (EPOS -Users, SW services, computing services) • Title may be non-descriptive • insufficiently unique (“Roads”) • meaningful, but opaque naming patterns (eg “AXXX34nn1”) • Keywords • may be missing or may be too specific to domain • may lack references to a thesaurus/CV or are freeform text • Info missing such as Abstract, Contact saying “call”, location, time without reference, wrong URL • Grouping: a range of metadata records from a single source may be very similar (only differ in one parameter e.g. location) – they may be better discovered as a group of records • Duplicates • Several metadata records from different catalogs may point to the same physical dataset (or have overlapping susbsets of distributions) Provenance Issue? ......* Community Inventory of EarthCube Resources for Geosciences Interoperability (http://workspace.earthcube.org/cinergi)
Support bridging heterogeneity To achieve interoperability Are we Ready to Break the MD Bottleneck, make up for deficiencies & satisfy Ambitious MD Requirements? • Provide the possibility to link metadata. • Recommend/advise to link with certain other datasets. • Warn if linking two datasets does not make sense. • Use a good URI strategy. • Use identifiers (but which?). • Use well-accepted vocabularies. • Use well-accepted thesauri. (Ontologies?) • Warn about linking when datasets have temporal aspects. • Provide advice. • Monitor links between data and make sure that they are still up to date. • Make sure that linking is not just spatial, link to other domains as well. Easy to add, discover, download, access & exchange MD Support data integration. Support linking of data Be consistent & support interpretation of data Be sustainable Researchers do not see value in metadata & its management tools (e.g. relational databases, wikis, etc.) There is perceived cost of adding and maintaining metadata. Suitable representation for search, browsing & query Bridge different MD models e.g. ISO vs DC Fields may have diff meaning So how do we satisfy these and create quality Md and/ or extended it? In large part from RDA MD discussing and also the work of AnnekeZuiderwijk, Keith Jeffery, Marijn Janssen and : Duval, Erik, et al. "Metadata principles and practicalities." D-lib Magazine 8.4 (2002): 16.
Broad View of Metadata (Schema) Status & Argument for More Semantics Richness issue • Even when done well simple annotations and structured metadata are not rich enough to support ad hoc use & certainly not reasoning based on meaning. • There are many MD schemas and a broad challenge is to link/integrate them. • “Metadata schemas are created for resources’ identification and description and - most of the times - they do not express rich semantics. Even though the meaning of the metadata information can be processed by humans and its relationship to the described resource can be understood, for machine processing the actual relationships are frequently not obvious. In contrast to metadata schemas, ontologies provide rich constructs to express the meaning of data” • Stasinopoulou, Thomais, et al. "Ontology-based metadata integration in the cultural heritage domain." Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers. Springer Berlin Heidelberg, 2007. 165-175.
RDA Background & Outreach on Semantics • Agrowing interest in the topic of semantic interoperability. • The centrality of semantic issues was, for example, noted following the 1st Plenary. • Semantic issues and technologies are already part of the discussion on the RDA Forum. Research communities need to adopt and deploy technologies that help them get the most from their data, understand context, and infer meaning. The semantic web community has much to contribute to an enabling global infrastructure and it would be great to see greater involvement in the RDA. • Fran Berman (Professor of Computer Science, RPI, Chair of the Research Data Alliance/U.S.) • RDA should take on this issue but how? And who will participate?
RDA Metadata and Semantics Intersect • Data Foundations and Terminology (WG & IG) • Data in Context IG • Data Fabric IG • Geospatial IG • Marine Data Harmonization IG( ISO 19115 etc.) • Broker IG • Research Data Provenance....... • Semantic Interoperability BoF at RDA P3 • 3 Presentations to illustrate key concepts of SI & use of ontologies- Gary Berg-Cross & Yann Le Franc • Discussed Ontology Design Patterns and Lightweight methods • EUON effort • What is a quality ontology? • 1st European Ontology Network (EUON) Workshop co-located at P4 • http://www.eudat.eu/euon/euon-2014-workshop
The Need for Some Semantics is (somewhat) Understood Restrictive • MD need to be a first class, processable system, like a conceptual model, easier to use, manage and follow efforts to make data more understandable by computers. • Semantics helps address what MD annotations mean • What the shared meanings are • What the assumptions such as relations between MD items are and • How links to other data can be included? Principles and Foundations of Ontologies and Semantic Grids - Session 48. July 15th, 2009 Oscar Corcho (Universidad Politécnica de Madrid) http://www.slideshare.net/ISSGC/session-48-principles-of-semantic-metadata-management
How do we add Semantics to MD? Depends on Intended Use : Example of Semantic Annotations (HTML -> RDFa) For data description and context the semantics added can be like a formal, conceptual model For search it can be like a better annotation of keywords using RDF. • Start with a collection XHTML attributes in a web page • Embed RDF annotations in the web pages using things like • DC and FOAF vocabularies easily used for most simple annotations -e.g. Creator, title, contact info Becomes From Introduction to Semantic Technologies, Ontologies and the Semantic Web Aug 2010 #39
Beyond Vocabularies: Good Semantics Needs Appropriate Conceptualization of Properties Connect properties like stream flow, level, pollutants, evapotranspiration etc. in a schema hasLayer ….. Grams /cm3 Water Density Unit Water Density Water Body hasDensity Unit hasConstituent hasFeature hasUnit HasFeature IsA hasValue Area Real Number Area Quantity Chesapeake Bay Sq Miles hasQuantity hasUnit • For connecting to Chem/BioChem ontologies there might be sub-categories of Physical Features for elements – optical, hardness, color • See Dumontier Lab ontologies to represent bio-scientific concepts and relations. • http://dumontierlab.com/?page=ontologies
Ontology Design Patterns (ODPs) of Semantic Trajectory – Hydro/Ocean Observations as Annotations Hydro Obs/Device • ODPs (aka microtheories) small, modular, & coherent schemas. • Relatively autonomous but conceivably composable with other schemas. • Environmental Observations fit into this schema. • Fixes may be hydrometric feature observations & at some PoI (and offset Fix) for some point or period of time denoting important activities • Observations including time series sets might be applied to something like streamflow or temperature plots or a pollution plume or data from an ocean glider • You may query Schema : • “Show locations within Gulf of Mexico fishing area with colored dissolved organic matter” Hydro Var & attr/data or value type of Interest Paths & POIs Have Geometries including Polygon Areas Hydro Object or moving device A Geo-Ontology Design Pattern for Semantic Trajectories COSIT 2013: YingjieHuet al.
Are we ready for metadata semantics to be widely used? • How do we bring current MD practice and semantic practice together? • What is a practical MD vision of this enhanced MD? • Where are the opportunities? • E.g. Is semantic annotation the sweet spot? • Do we just expand MD tags to semantic annotations and if so how? • What about ontology design patterns (ODPs)? Where are they useful? • Thoughts on where to add semantics and its technology to MD in the data/MD cycle? • How does it affect how data/md repositories function? • Some/considerable confusion about how MD should be integrated into information systems. • Can we agree on common or domain principles (like modularity or building blocks), practices and tools to employ ?