1 / 30

In Search of What Some of It Means

Peter Fox (RPI) discusses the challenges of accessing and integrating scientific data from various sources, as well as the importance of metadata in facilitating its use.

matthewd
Download Presentation

In Search of What Some of It Means

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. In Search of What Some of It Means RDA Semantics and Metadata Workshop Feb 23, 2015 Peter Fox (RPI) pfox@cs.rpi.edu Tetherless World Constellation

  2. Metadata and documentation

  3. Not more code!

  4. Spectral synthesis components and flow

  5. Getting the metadata?

  6. What I wanted ~ 1994-6 Scientists should be able to access a global, distributed knowledge base of scientific data that: • appears to be integrated • appears to be locally available But… data is obtained by multiple means (instruments, models, analysis) using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) metadata. It may be inconsistent, incomplete, evolving, and distributed. And, it is almost always created in a manner to facilitate its generation not its use. And… there exist(ed) significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…

  7. What I was doing… tmp_id = ncdf_dimid(ncid, "comment_dim") ncdf_diminq,ncid, tmp_id, dummy, comment_dim tmp_id=ncdf_dimid(ncid, "mu_dim") ncdf_diminq,ncid, tmp_id, dummy, mu_dim tmp_id=ncdf_dimid(ncid, "wave_dim") ncdf_diminq,ncid, tmp_id, dummy, wave_dim tmp_id=ncdf_dimid(ncid, "model_dim") ncdf_diminq,ncid, tmp_id, dummy, model_dim tmp_id=ncdf_dimid(ncid, "smodel_dim") ncdf_diminq,ncid, tmp_id, dummy, smodel_dim tmp_id=ncdf_dimid(ncid, "item_dim") ncdf_diminq,ncid, tmp_id, dummy, item_dim pro read_spec, spectra_name, description, auxiliary_info, model_size, mu_size, wave_size, model, smodel, mu, wave0, wavelength, intensity, brightness_temperature, index1, index2, percent ncopts = 0; description_start=0 description_edges=80 i=0 j=0 k=0 ; Construct the DB filename ncid=ncdf_open(string(getenv("SPECTRA"))) inq_struct=ncdf_inquire(ncid) ; /* get dimension info */

  8. What I was doing… etc. start=intarr(1) edges=intarr(1) start(0)=0 edges(0)=model_size tmp_id=ncdf_varid(ncid, "mu_size") ncdf_varget,ncid, tmp_id, mu_size, OFFSET=start, COUNT=edges tmp_id=ncdf_varid(ncid, "model") ncdf_varget,ncid, tmp_id, model, OFFSET=start, COUNT=edges start=intarr(2) edges=intarr(2) start(0)=0 edges(0)=smodel_dim start(1)=0 edges(1)=model_size tmp_id=ncdf_varid(ncid, "smodel") ncdf_varget,ncid, tmp_id, smodel, OFFSET=start, COUNT=edges tmp_id = ncdf_varid (ncid, "description") ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, description ; Id's for variables tmp_id=ncdf_varid(ncid, "spectra_name") ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, spectra_name tmp_id=ncdf_varid(ncid, "auxiliary_info") ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=comment_dim, auxiliary_info tmp_id=ncdf_varid(ncid, "model_size") ncdf_varget,ncid, tmp_id, OFFSET=0,COUNT=item_dim, model_size

  9. What does It all Mean?

  10. Some version of this… ~Metadata? Experience Data Information Knowledge Creation Gathering Presentation Organization Integration Conversation Context

  11. It and Meaning • It = things that matter • Context • Meaning = duh -> semantics • Relations!! Real ones! • But it was more than that, though that often comes later… • Syntax (structure/form) • Semantics (meaning) • Pragmatics (use)

  12. Metadata-Information-Knowledge Ecosystem Experience Metadata Information Knowledge Creation Gathering Formalization Organization Integration Shared Conceptualization Context

  13. Provenance • Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility • Provenance: metadata in a given context! Swallow that. • Knowledge provenance; meaning and relations in multiple contexts!

  14. Perfect is the enemy of the good… (thanks Voltaire)

  15. Origins … • In 2000-2001 the need for capturing and preserving knowledge in science data became very clear but the barriers were high • In 2004 we started a virtual observatory project based on semantic technologies • Use case driven – in solar and solar-terrestrial physics with an emphasis on instrument-based measurements and real data pipelines; we needed implementations • We knew we also needed integration and provenance (but that came later) • We aimed to push semantics into our systems to build new ‘prototypes’ but we ‘failed’ ;-) Tetherless World Constellation

  16. In 2004 • 2004 – OWL was a W3 recommendation!! • Protégé 2.x and the Protégé-Java-OWL API • SWOOP was a viable editor • Jena and the Jena API were in good shape • Pellet worked • SPARQL was still a twinkle in the RDF working group’s eye • Semantics were still the realm of computer scientists Tetherless World Constellation

  17. Design and Development • We made a conscious decision only to develop ontologies that were required to answer specific use cases and migrate metadata • Both Classes AND Properties (uh-oh…) • We made a conscious effort to use whatever ontologies were available (cf. trends in metadata… nuff said) • We were pretty sure that rules would be needed (complex logic or late semantic binding) • We ignored query (see implementation) Tetherless World Constellation

  18. Use Case example • Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series. • Plot the neutral temperaturefrom the Millstone-HillFabry Perot, operatingin thenon-vertical modeduringJanuary 2000as atime series. • Meanings and relations • Objects=Things! • Neutral temperature is a (temperature is a) parameter • Millstone Hill is a (ground-based observatory is a) observatory • Fabry-Perot is a interferometer is a optical instrument is a instrument • Non-vertical mode is a instrument operating mode • January 2000 is a date-time range • Time is a independent variable/ coordinate • Time series is a data plot is a data product • Metadata just appeared everywhere…

  19. Semantics - Modern informatics enables a new scale-free** framework approach • Use cases • Stakeholders • Distributed authority • Access control • Ontologies • Maintaining Identity

  20. Semantics between 2004 and 2009 • Ontologies were needed for data integration and provenance and mediation for data mining • Protégé 3.x and then 4.0 came out • SWOOP development was interrupted • Cmap added OWL predicate support* • SPARQL became a recommendation • Triple stores exploded in use and capability • Linked Open Data started to take off • Pellet 2.0 came out • I used the “M” word less frequently! Tetherless World Constellation

  21. Working with knowledge Expressivity Implementability Maintainability/ Extensibility

  22. Working with semantics Query Inference Rule execution

  23. Semantics between 2009 and now • Semantic data framework (SeSF) • Substantial knowledge provenance work • Data quality, uncertainty and bias representations and applications (oh, these are in production at NASA) • Multi-sensor data synergy advisor • Applications: • Sea Ice, Carbon Observatory, Integrated Ecosystem Assessments, globalchange.gov, ocean.data.gov, energy.data.gov …. Tetherless World Constellation

  24. Respect and Mediation … how

  25. Discovering new data

  26. NCA links to GCIS entities http://data.globalchange.gov

  27. Information model Ontology

  28. Core and Framework Semantics - Multi-tiered interoperability used by

  29. Closing thoughts • Go ahead, create all the metadata you want, we’ll “materialize” some of it into triples based on semantics for use! • Go ahead, create all the schema and encodings you want but remember – semantics now lives in an open-world (some of it). You are not the only source of metadata. Not all formal. Link over map. • Semantics make metadata useful but we do not need all of your metadata Tetherless World Constellation

  30. Contact • pfox@cs.rpi.edu • http://tw.rpi.edu • @taswegian

More Related