220 likes | 401 Views
Data Standards at the IRI Data Library. M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-Kopec International Research Institute for Climate and Society Columbia University http://iridl.ldeo.columbia.edu/. Current Data Exchange Standards. There are many of them
E N D
Data Standards at the IRI Data Library M.Benno Blumenthal, Michael Bell, John del Corral, and Emily Grover-KopecInternational Research Institute for Climate and SocietyColumbia Universityhttp://iridl.ldeo.columbia.edu/
Current Data Exchange Standards • There are many of them • Some are flexible but semantically weak • Others are semantically specific but not sufficiently flexible We are working on this …
Dataset • Dataset • Dataset • Variable • ivar • ivar multidimensional Data Library Overview Specialized Data Tools Maproom Generalized Data Tools Data Viewer Data Language IRI Data Collection URL/URI for data, calculations, figs, etc
Economics Public Health “geolocated by entity” IRI Data Collection Dataset • Dataset • Dataset • Variable • ivar • ivar multidimensional IRI Data Collection Ocean/Atm “geolocated by lat/lon” multidimensional GIS “geolocation by vector object or projection metadata” spectral harmonics equal-area grids GRIB grid codes climate divisions
spreadsheets GRIB netCDF images binary shapefiles Database Tables queries Servers OpenDAP THREDDS images w/proj IRI Data Collection Dataset • Dataset • Dataset • Variable • ivar • ivar IRI Data Collection
spreadsheets GRIB netCDF images binary shapefiles Database Tables queries Servers OpenDAP THREDDS images w/proj IRI Data Collection Dataset • Dataset • Dataset • Variable • ivar • ivar Calculations “virtual variables” images graphics descriptive and navigational pages Tables Clients OpenDAP THREDDS Data Files netcdf binary images OpenGIS WMS v1.3 IRI Data Collection
OpenDAP OpenDAP: very important to us because we can act as both a client and as a server, and because it is flexible enough to represent all our calculations (“virtual variables”), i.e. a user can specify an analysis and export it. At the moment we cannot read shapefile data using it (and the serving of shapes over OpenDAP is consequently untested), but hopefully that is temporary Impedance mismatch is low
Other Important Standards netcdf GRIB GEOTIFF Shapefiles vs. PostGIS in Postgres (OGC compliant)
Standards becoming important to us (we think) OGC: GIS Conceptual Framework OGC: WMS, WFS, WCS These are designed to be partial – we will have many datasets/analyses that we cannot transfer using these protocols
Interoperability requires Semantics Currently we have some numeric interoperability, but we have a long ways to go for semantic interoperability
Standard Metadata Standard Metadata Schema/Data Services Datasets Tools Users
StandardMetadataSchema StandardMetadataSchema StandardMetadataSchema StandardMetadataSchema StandardMetadataSchema Datasets Datasets Datasets Datasets Datasets Tools Tools Tools Tools Tools Users Users Users Users Users Many Data Communities
StandardMetadataSchema StandardMetadataSchema StandardMetadataSchema StandardMetadataSchema StandardMetadataSchema Datasets Datasets Datasets Datasets Datasets Tools Tools Tools Tools Tools Users Users Users Users Users Super Schema Standard metadata schema
StandardMetadataSchema StandardMetadataSchema StandardMetadataSchema StandardMetadataSchema StandardMetadataSchema Datasets Datasets Datasets Datasets Datasets Tools Tools Tools Tools Tools Users Users Users Users Users Super Schema: direct Standard metadata schema/data service
Flaws • A lot of work • Super Schema/Service is the Lowest-Common-Denominator • Science keeps evolving, so that standards either fall behind or constantly change
StandardMetadataSchema StandardMetadataSchema StandardMetadataSchema StandardMetadataSchema StandardMetadataSchema Datasets Datasets Datasets Datasets Datasets Tools Tools Tools Tools Tools Users Users Users Users Users RDF Standard Data Model Exchange Standard metadata schema RDF RDF RDF RDF RDF RDF
RDF RDF RDF RDF RDF StandardMetadataSchema StandardMetadataSchema StandardMetadataSchema StandardMetadataSchem StandardMetadataSchema RDF RDF RDF RDF RDF RDF RDF RDF RDF RDF Datasets Datasets Datasets Datasets Datasets Tools Tools Tools Tools Tools Users Users Users Users Users RDF Data Model Exchange Standard metadata schema RDF
queries queries queries RDF RDF RDF RDF RDF RDF RDF RDF RDF RDF RDF RDF RDF RDF RDF RDF RDF Architecture Virtual (derived) RDF
Why is this better? • Maps the original dataset metadata into a standard format that can be transported and manipulated • Still the same impedance mismatch when mapped to the least-common-denominator standard metadata, but • When a better standard comes along, the original complete-but-nonstandard metadata is already there to be remapped, and “late semantic binding” means everyone can use the new semantic mapping • Can uses enhanced mappings between models that are close • EASIER – these are tools to enhance the mapping process
Key Features of RDF/OWL Web-based Framework for writing down and interrelating semantic standards Non-contextual Modeling: data object relationships are stated explicitly, not inferred from context Late-Semantic-Binding: semantics do not alter transport/storage, semantic mapping can be added later as scientific fields evolve Not much track record – yet
RDF vs. XML Schema RDF is usually transported as XML So it is XML But it differs from XML Schema in that the Schema is not fixed beforehand XML Schema – a prearranged exchange RDF/XML – add to/query an information space
Sample Tool: Faceted Search http://iridl.ldeo.columbia.edu/ontologies/query2.pl?...