280 likes | 294 Views
The GeoViQua project focuses on visualizing and promoting data quality in the Global Earth Observation System of Systems (GEOSS). It addresses the need for a global model for quality and aims to improve metadata records for better assessment of data fitness-for-use. The project also introduces a quality model that includes both quantitative and qualitative aspects of data quality.
E N D
GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org
QUAlity aware VIsualisation for the Global Earth Observation system of systems It’s an FP7 project devoted to show quality information embedded in GEOSS data (2011-2014) 10 partners, 7 countries
The problem GEOSS data is treated by means of the GEOSS Common Infrastructure (GCI) • Is there quality information in the GCI? • There is some in the form of ISO19115 DQ elements and lineage • But.. not enough • The GCI does not follow a global model for quality The GCI is shown and searchable on the GEO Portal • The GEOPortal search and results • are not ranged by quality • quality indicators are not easily comparable • spatially distributed uncertainty is not included
Community View Data Quality? • Many researchers refer to the ‘famous five’ as the common criteria for evaluating spatial data quality: • lineage; completeness; consistency; positional accuracy; and attribute accuracy. • Broad scientific acceptance of the common spatial quality elements does not apply to all cases for “fitness-for-use” evaluation • user requirements can go far beyond the widely accepted ‘famous five’. • We used semi-structured telephone and face-to-face interviews with a variety of geospatial data users and experts from a number of countries and application domains. More information at: http://www.geoviqua.org/Docs/SubmittedDeliverables/D2_1_GeoViQua.pdf
What about users? • Users are exceedingly interested in good quality metadata records • And information that can help to assess fitness-for-use of the data • Users find metadata records typically incomplete with essential data omitted • The process of dataset discovery and selection is more difficult • Users are also interested in ‘soft’ knowledge about data quality • Data providers’ comments on the overall quality of a dataset, known data errors, potential data usage • Peers’ reviews and recommendations (they contact their peers to obtain suggestions) • Dataset provenance, citation and licensing information • Citation is incomplete (lack of valid producer contact details), and licensing often missing • Citation: users rely on data from good reputation producers • Currently, some of these cannot be recorded in standard metadata • Users need to easily and systematically compare metadata records • Side-by-side visualisation of all metadata elements would allow geospatial datasets to be compared more effectively, • especially when datasets are very similar and differences are hard to distinguish
Quality model is much more than positional accuracy • There are many quantifiable aspects that can be recorded: • Consistency, completeness, positional, thematic and temporal accuracy… • There are many qualitative aspects that are needed: • Lineage (traceability), scientific papers, user feedback, data usage…
GeoViQua Data model treats statistical uncertainties • <gmd:DQ_QuantitativeAttributeAccuracy> • <gmd:result> • <gmd:DQ_QuantitativeResult> • <gmd:valueType> • <gco:RecordType xlink:href=“http://www.uncertml.org/distributions/normal”> • Value of the vertical DEM accuracy • </gco:RecordType> • </gmd:valueType> • <gmd:valueUnit>m</gmd:valueUnit> • <gmd:value> • <gco:Record> • <un:NormalDistribution> • <un:mean>1.2</un:mean> • <un:variance>3.6</un:variance> • </un:NormalDistribution> • </gco:Record> • </gmd:value> • </gmd:DQ_QuantitativeResult> • </gmd:result> • </gmd:DQ_QuantitativeAttributeAccuracy> <gmd:DQ_QuantitativeAttributeAccuracy> <gmd:result> <gmd:DQ_QuantitativeResult> <gmd:valueUnit>m</gmd:valueUnit> <gmd:value> <gco:Record>3.6</gco:Record> </gmd:value> </gmd:DQ_QuantitativeResult> </gmd:result> </gmd:DQ_QuantitativeAttributeAccuracy> • Explicit recognition that errors acceptably fit a Normal distribution with mean 1.2 • An overall positive bias was observed • A difficult feature to convey by traditional means)
Two models on data quality are needed • Producer’s quality metadata • In the producers metadata records • Encoded in the classical ISO 19115/19139 • Some extensions required • Stored in the current catalogues (GEOSS Clearinghouse, etc) • User’s quality metadata • In independent metadata repositories • Linked to producer’s metadata by id • Future component of the GCI? • Contains comments, “like it”, star rates, etc
Advances in quality models: GVQ - producer quality model http://schemas.geoviqua.org/GVQ/3.1.0
Advances in quality models: GVQ - producer quality model • Publications. Based on ISO 19115 CI_Citation and extended with ISO 690 elements. Added to a number of quality elements within the metadata document. An existing DQ_ or MD_ element is extended to allow a ‘referenceDoc’to be added. • Discovered issues. Added discovered issue class (e.g., a problem which the producer has identified during generation of a dataset) to the DQ_DataQuality element. • Reference datasets used for evaluation. Added to ‘dataEvaluation’ section of the 19157 to allow recording the reference dataset used to assess the quality indicator. • Traceability. Added a new ‘metaquality’ type to allow the lineage of a data quality assessment to be recorded, along with its representativity and coverage. This is a requirement of the QA4EO principles. More information: Lucy Bastin [l.bastin@aston.ac.uk] & a poster in this session room
Advances in quality models: GVQ - user quality model http://schemas.geoviqua.org/GVQ/3.1.0
Advances in quality models: GVQ - user quality model ISO 19115 only provides the MD_Usage to report how users apply the dataset in their activities. This is insufficient for the GEOSS needs. GeoViQua has elaborated this model from scratch. A user can submit a GVQ_FeedbackItem in a form of: • A user comment. • A rating mark. • A usage report supported by a citation of a report. • A link to external feedback (blog pages, Google docs document, etc). • A metadata override that amends a producer metadata value. • A quality label (GEO Label). • These items are related to a dataset through an identifier. More information: Lucy Bastin [l.bastin@aston.ac.uk] & a poster in this session room
Advances in quality models: GVQ - user quality model • The GeoViQua Quality Model is explained in the GEOSS Best Practice Twiki: http://wiki.ieee-earth.org/GEOSS_Tutorials • It has been presented in the AIP5 session and it i’s a contribution to the GEOSS Standards and Interoperability Forum (SIF). More information: Anna Riverola [Anna.Riverola@uab.cat]
Advances in visualizing metadata quality information GeoViQua has developed the Q-Rubric tool, an extension on the NOAA former’s version • An XSLT tool that convert XML metadata files into an HTML punctuation page. • Analyses every ISO quality metadata information and rates it by presence/absence (attributing one point when metadata exists, but not penalizing if information is missing). • Help users to evaluate how many metadata elements related to data quality are provided. • Adds two new information groups related to ISO quality: Quality and Usage. • GEOSS representation style has been applied to the original Rubric tables.
Advances in visualizing metadata quality information Download it: http://www.geoviqua.org/docs/isoRubricQHTML.xsl • Some results from the GCI: • 97203 metadata records held in the Clearinghouse; 96867 analysed • 14.79% non defining mandatory topic category • 80.63% do not have any quality element (of any class) • Quality: Positional accuracy is the most populated class with 37.77% documented. 36.06% of completeness and 18.79% of logical consistency. Only 0.50% regards to thematic accuracy. • Lineage: 35.27% do not have any lineage sub-element defined. • Usage: 0.60% of elements documented. • Conclusions: • Metadata providersdo not comply with the ISO Core Mandatory. Many topic categories present just a 75% of completeness. • This impacts metadata search engines for data discovery requests. More information: AlaitzZabala [Alaitz.Zabala@uab.cat]
Advances in visualizing quality information I Integrating UncertWeb project proposals: Use NetCDF-U The Network Common Data Form (NetCDF) is one of the primary methods of self documenting data storage and access in the international geosciences research and education community and beyond. NetCDF-U Conventions are used to formally qualify the uncertainty information in geospatial data encoded in the netCDF-3 format, by means of concepts from the UncertML best practice of the UncertWeb project NetCDF-U Conventions are designed to be fully compatible with the netCDF Climate and Forecast Conventions, the de-facto standard for a large amount of data in the Fluid Earth Science community. It is now a discussion paper in OGC.
Advances in visualizing quality information I • Many data involved in the GeoViQua scenarios are encoded in NetCDF. • An open source format file. • Gives strength and freedom to encode metadata. GeoViQua is developing tools for reading and writing NetCDF-U files and import/export from/to other raster formats. NetCDF file opened with the NASA software Panoply NetCDF file exported to IMG file and opened with the new tool More information: Victor Zaldo [v.zaldo@creaf.uab.cat]
Advances in visualizing quality information II Integration of Quality Information with OGC Web Map Service: WMS-Q • The WMS 1.3.0 currently does not well support the integration of quality information into WMS. • The current WMS does not support how data layer can semantically associate with the corresponding uncertainty layers. • WMS-Q specification is proposed as far as possible within the bounds of the WMS 1.3.0 specification, requiring as few extensions as possible. • To integrate the dataset-level quality information into the WMS, we propose to expand slightly “Type” attribute of “MetadataURL” element to have “unstructured” and “other-structured” options. • Propose to add a “description” element for the “MetadataURL” element. • Pixel-level uncertainty information can be encoded using NetCDF Uncertainty Conventions (NetCDF-U). • Work tested in the OGC interoperability experiment OWS-9 More information: Jon Blower [j.d.blower@reading.ac.uk]
Advances in visualizing quality information III Nearly uncertain in both campaigns Preliminary results from experiments with colour coding: • Quality should be intercomparable - i.e. the saturation should be intuitively comparable even across hues/categories. Perceptual colour models make this possible. • Hue represents category, and saturation represents the "Purity for the parcel enrichment" (in percent) or the certainty. Gain in certainty 16.12.2006 22.03.07 More information: Simon Thum [simon.thum@igd.fraunhofer.de]
Advances in visualizing quality information IV Creation of a “Carbon Atlas” portal Combining the possibilities of web mapping with the comparison of models including uncertainty: combination of ncWMS (server) and OpenLayers (client): 1. Possibility to compare models between them: ncWMS: Web Map Service for geospatial data that are stored in CF-compliant NetCDF files (developed and maintained by the Reading e-Science Centre)
Advances in visualizing quality information IV 2. Creation of Comparison map (based on IPCC’s visualization method): colour pixel = difference between models, patterns = % on how models agree. Need to add to the ncWMS server the possibility to associate pattern/raster. More information: Pascal Evano [p.evano@creaf.uab.cat]
Advances in applied scenarios II Uncertainty assessment for continuous and categorical variables • Continuous variables: uncertainty related to citizens meteo data in relation to the official Metoffice ones. More information: Dan Cornford [D.Cornford@aston.ac.uk] • Categorical variables: spatialized quality indicators coming from a satellite image classification. Global, local and pixel uncertainty level. Several statistical classification methods are used. More information: Eva Sevillano [Eva.Sevillano@uab.cat] Cat1-Classification Cat2 Cat3 Fidelity Probability of success (%)
Advances in including data quality in search • Quality search integrated in the EuroGEOSS Discovery and Access Broker to be applied to the GEO Portal.
Advances in including data quality in search • Retrieve quality information embedded in Metadata More information: Lorenzo Bigagli [lorenzo.bigagli@pin.unifi.it]
Advances in labelling the quality: the GEO Label • What is it? • The GEO Label is intended to “assist the user to assess the scientific relevance, quality, acceptance and societal needs of the components” (ST-09-02 Task Team, 2010). • Purposes? • be a qualityindicator for GEOSS geospatial data and datasets; • improve user recognition and trust in datasets that carry a GEO label; • assist in searching by providing users with visual clues of dataset quality and relevance; and • increase visibility of EO data. • GEO label development: • The GeoViQua project is currently undertaking research to define and evaluate the concept of a GEO label. • The development is carried out in three phases: Done! In progress!
Advances in labelling the quality: the GEO Label • Phase I Study: • Overall, GEO label questionnaire results show that users and producers agree on the benefits of introducing a GEO label, with no distinct difference between user and producer views. • The majority of respondents support an all-in-one drill-down interrogation facility as the key GEO label function. • Phase II Study: • The GEO labels will be a graphical representation generated individually for each dataset in the GEOSS (or other data portals and clearinghouses) based on the quality information that is available for that dataset. • Second online questionnaire-based survey to identify the designs that convey quality information to users in most efficient and comprehensible way. • Currently: • At this stage we are analysing the GEO label study II results to fully define and establish a GEO label that meets the needs of the geodata user community. • Phase III: we will create physical prototypes which will be used in a human subject study. More information: Victoria Lush [lushv@aston.ac.uk]
The future • Many possibilities has been shown. • Now the project enters in a development phase where the concepts exposed and prototypes need to be developed. • Move the GeoViQua Quality Model for a broader adoption. • Develop a user feedback system prototype. • Test search and visualization developments in a GEO Portal replica (ESA contribution) • Work with the Architecture GEO committees to move some of this contribution for adoption in the GCI.
GeoViQua: Advances in data quality disclosing Thanks! Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org