1 / 28

GeoViQua: Advances in data quality disclosing

The GeoViQua project focuses on visualizing and promoting data quality in the Global Earth Observation System of Systems (GEOSS). It addresses the need for a global model for quality and aims to improve metadata records for better assessment of data fitness-for-use. The project also introduces a quality model that includes both quantitative and qualitative aspects of data quality.

morello
Download Presentation

GeoViQua: Advances in data quality disclosing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GeoViQua: Advances in data quality disclosing Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

  2. QUAlity aware VIsualisation for the Global Earth Observation system of systems It’s an FP7 project devoted to show quality information embedded in GEOSS data (2011-2014) 10 partners, 7 countries

  3. The problem GEOSS data is treated by means of the GEOSS Common Infrastructure (GCI) • Is there quality information in the GCI? • There is some in the form of ISO19115 DQ elements and lineage • But.. not enough • The GCI does not follow a global model for quality The GCI is shown and searchable on the GEO Portal • The GEOPortal search and results • are not ranged by quality • quality indicators are not easily comparable • spatially distributed uncertainty is not included

  4. Community View Data Quality? • Many researchers refer to the ‘famous five’ as the common criteria for evaluating spatial data quality: • lineage; completeness; consistency; positional accuracy; and attribute accuracy. • Broad scientific acceptance of the common spatial quality elements does not apply to all cases for “fitness-for-use” evaluation • user requirements can go far beyond the widely accepted ‘famous five’. • We used semi-structured telephone and face-to-face interviews with a variety of geospatial data users and experts from a number of countries and application domains. More information at: http://www.geoviqua.org/Docs/SubmittedDeliverables/D2_1_GeoViQua.pdf

  5. What about users? • Users are exceedingly interested in good quality metadata records • And information that can help to assess fitness-for-use of the data • Users find metadata records typically incomplete with essential data omitted • The process of dataset discovery and selection is more difficult • Users are also interested in ‘soft’ knowledge about data quality • Data providers’ comments on the overall quality of a dataset, known data errors, potential data usage • Peers’ reviews and recommendations (they contact their peers to obtain suggestions) • Dataset provenance, citation and licensing information • Citation is incomplete (lack of valid producer contact details), and licensing often missing • Citation: users rely on data from good reputation producers • Currently, some of these cannot be recorded in standard metadata • Users need to easily and systematically compare metadata records • Side-by-side visualisation of all metadata elements would allow geospatial datasets to be compared more effectively, • especially when datasets are very similar and differences are hard to distinguish

  6. Quality model is much more than positional accuracy • There are many quantifiable aspects that can be recorded: • Consistency, completeness, positional, thematic and temporal accuracy… • There are many qualitative aspects that are needed: • Lineage (traceability), scientific papers, user feedback, data usage…

  7. GeoViQua Data model treats statistical uncertainties • <gmd:DQ_QuantitativeAttributeAccuracy> • <gmd:result> • <gmd:DQ_QuantitativeResult> • <gmd:valueType> • <gco:RecordType xlink:href=“http://www.uncertml.org/distributions/normal”> • Value of the vertical DEM accuracy • </gco:RecordType> • </gmd:valueType> • <gmd:valueUnit>m</gmd:valueUnit> • <gmd:value> • <gco:Record> • <un:NormalDistribution> • <un:mean>1.2</un:mean> • <un:variance>3.6</un:variance> • </un:NormalDistribution> • </gco:Record> • </gmd:value> • </gmd:DQ_QuantitativeResult> • </gmd:result> • </gmd:DQ_QuantitativeAttributeAccuracy> <gmd:DQ_QuantitativeAttributeAccuracy> <gmd:result> <gmd:DQ_QuantitativeResult> <gmd:valueUnit>m</gmd:valueUnit> <gmd:value> <gco:Record>3.6</gco:Record> </gmd:value> </gmd:DQ_QuantitativeResult> </gmd:result> </gmd:DQ_QuantitativeAttributeAccuracy> • Explicit recognition that errors acceptably fit a Normal distribution with mean 1.2 • An overall positive bias was observed • A difficult feature to convey by traditional means)

  8. Two models on data quality are needed • Producer’s quality metadata • In the producers metadata records • Encoded in the classical ISO 19115/19139 • Some extensions required • Stored in the current catalogues (GEOSS Clearinghouse, etc) • User’s quality metadata • In independent metadata repositories • Linked to producer’s metadata by id • Future component of the GCI? • Contains comments, “like it”, star rates, etc

  9. Advances in quality models: GVQ - producer quality model http://schemas.geoviqua.org/GVQ/3.1.0

  10. Advances in quality models: GVQ - producer quality model • Publications. Based on ISO 19115 CI_Citation and extended with ISO 690 elements. Added to a number of quality elements within the metadata document. An existing DQ_ or MD_ element is extended to allow a ‘referenceDoc’to be added. • Discovered issues. Added discovered issue class (e.g., a problem which the producer has identified during generation of a dataset) to the DQ_DataQuality element. • Reference datasets used for evaluation. Added to ‘dataEvaluation’ section of the 19157 to allow recording the reference dataset used to assess the quality indicator. • Traceability. Added a new ‘metaquality’ type to allow the lineage of a data quality assessment to be recorded, along with its representativity and coverage. This is a requirement of the QA4EO principles. More information: Lucy Bastin [l.bastin@aston.ac.uk] & a poster in this session room

  11. Advances in quality models: GVQ - user quality model http://schemas.geoviqua.org/GVQ/3.1.0

  12. Advances in quality models: GVQ - user quality model ISO 19115 only provides the MD_Usage to report how users apply the dataset in their activities. This is insufficient for the GEOSS needs. GeoViQua has elaborated this model from scratch. A user can submit a GVQ_FeedbackItem in a form of: • A user comment. • A rating mark. • A usage report supported by a citation of a report. • A link to external feedback (blog pages, Google docs document, etc). • A metadata override that amends a producer metadata value. • A quality label (GEO Label). • These items are related to a dataset through an identifier. More information: Lucy Bastin [l.bastin@aston.ac.uk] & a poster in this session room

  13. Advances in quality models: GVQ - user quality model • The GeoViQua Quality Model is explained in the GEOSS Best Practice Twiki: http://wiki.ieee-earth.org/GEOSS_Tutorials • It has been presented in the AIP5 session and it i’s a contribution to the GEOSS Standards and Interoperability Forum (SIF). More information: Anna Riverola [Anna.Riverola@uab.cat]

  14. Advances in visualizing metadata quality information GeoViQua has developed the Q-Rubric tool, an extension on the NOAA former’s version • An XSLT tool that convert XML metadata files into an HTML punctuation page. • Analyses every ISO quality metadata information and rates it by presence/absence (attributing one point when metadata exists, but not penalizing if information is missing). • Help users to evaluate how many metadata elements related to data quality are provided. • Adds two new information groups related to ISO quality: Quality and Usage. • GEOSS representation style has been applied to the original Rubric tables.

  15. Advances in visualizing metadata quality information Download it: http://www.geoviqua.org/‌docs/‌isoRubricQHTML.xsl • Some results from the GCI: • 97203 metadata records held in the Clearinghouse; 96867 analysed • 14.79% non defining mandatory topic category • 80.63% do not have any quality element (of any class) • Quality: Positional accuracy is the most populated class with 37.77% documented. 36.06% of completeness and 18.79% of logical consistency. Only 0.50% regards to thematic accuracy. • Lineage: 35.27% do not have any lineage sub-element defined. • Usage: 0.60% of elements documented. • Conclusions: • Metadata providersdo not comply with the ISO Core Mandatory. Many topic categories present just a 75% of completeness. • This impacts metadata search engines for data discovery requests. More information: AlaitzZabala [Alaitz.Zabala@uab.cat]

  16. Advances in visualizing quality information I Integrating UncertWeb project proposals: Use NetCDF-U The Network Common Data Form (NetCDF) is one of the primary methods of self documenting data storage and access in the international geosciences research and education community and beyond. NetCDF-U Conventions are used to formally qualify the uncertainty information in geospatial data encoded in the netCDF-3 format, by means of concepts from the UncertML best practice of the UncertWeb project NetCDF-U Conventions are designed to be fully compatible with the netCDF Climate and Forecast Conventions, the de-facto standard for a large amount of data in the Fluid Earth Science community. It is now a discussion paper in OGC.

  17. Advances in visualizing quality information I • Many data involved in the GeoViQua scenarios are encoded in NetCDF. • An open source format file. • Gives strength and freedom to encode metadata.  GeoViQua is developing tools for reading and writing NetCDF-U files and import/export from/to other raster formats. NetCDF file opened with the NASA software Panoply NetCDF file exported to IMG file and opened with the new tool More information: Victor Zaldo [v.zaldo@creaf.uab.cat]

  18. Advances in visualizing quality information II Integration of Quality Information with OGC Web Map Service: WMS-Q • The WMS 1.3.0 currently does not well support the integration of quality information into WMS. • The current WMS does not support how data layer can semantically associate with the corresponding uncertainty layers. • WMS-Q specification is proposed as far as possible within the bounds of the WMS 1.3.0 specification, requiring as few extensions as possible. • To integrate the dataset-level quality information into the WMS, we propose to expand slightly “Type” attribute of “MetadataURL” element to have “unstructured” and “other-structured” options. • Propose to add a “description” element for the “MetadataURL” element. • Pixel-level uncertainty information can be encoded using NetCDF Uncertainty Conventions (NetCDF-U). • Work tested in the OGC interoperability experiment OWS-9 More information: Jon Blower [j.d.blower@reading.ac.uk]

  19. Advances in visualizing quality information III Nearly uncertain in both campaigns Preliminary results from experiments with colour coding: • Quality should be intercomparable - i.e. the saturation should be intuitively comparable even across hues/categories. Perceptual colour models make this possible. • Hue represents category, and saturation represents the "Purity for the parcel enrichment" (in percent) or the certainty. Gain in certainty 16.12.2006 22.03.07 More information: Simon Thum [simon.thum@igd.fraunhofer.de]

  20. Advances in visualizing quality information IV Creation of a “Carbon Atlas” portal Combining the possibilities of web mapping with the comparison of models including uncertainty: combination of ncWMS (server) and OpenLayers (client): 1. Possibility to compare models between them: ncWMS: Web Map Service for geospatial data that are stored in  CF-compliant  NetCDF files (developed and maintained by the  Reading e-Science Centre)

  21. Advances in visualizing quality information IV 2. Creation of Comparison map (based on IPCC’s visualization method): colour pixel = difference between models, patterns = % on how models agree. Need to add to the ncWMS server the possibility to associate pattern/raster. More information: Pascal Evano [p.evano@creaf.uab.cat]

  22. Advances in applied scenarios II Uncertainty assessment for continuous and categorical variables • Continuous variables: uncertainty related to citizens meteo data in relation to the official Metoffice ones. More information: Dan Cornford [D.Cornford@aston.ac.uk] • Categorical variables: spatialized quality indicators coming from a satellite image classification. Global, local and pixel uncertainty level. Several statistical classification methods are used. More information: Eva Sevillano [Eva.Sevillano@uab.cat] Cat1-Classification Cat2 Cat3 Fidelity Probability of success (%)

  23. Advances in including data quality in search • Quality search integrated in the EuroGEOSS Discovery and Access Broker to be applied to the GEO Portal.

  24. Advances in including data quality in search • Retrieve quality information embedded in Metadata More information: Lorenzo Bigagli [lorenzo.bigagli@pin.unifi.it]

  25. Advances in labelling the quality: the GEO Label • What is it? • The GEO Label is intended to “assist the user to assess the scientific relevance, quality, acceptance and societal needs of the components” (ST-09-02 Task Team, 2010). • Purposes? • be a qualityindicator for GEOSS geospatial data and datasets; • improve user recognition and trust in datasets that carry a GEO label; • assist in searching by providing users with visual clues of dataset quality and relevance; and • increase visibility of EO data. • GEO label development: • The GeoViQua project is currently undertaking research to define and evaluate the concept of a GEO label. • The development is carried out in three phases: Done! In progress!

  26. Advances in labelling the quality: the GEO Label • Phase I Study: • Overall, GEO label questionnaire results show that users and producers agree on the benefits of introducing a GEO label, with no distinct difference between user and producer views. • The majority of respondents support an all-in-one drill-down interrogation facility as the key GEO label function. • Phase II Study: • The GEO labels will be a graphical representation generated individually for each dataset in the GEOSS (or other data portals and clearinghouses) based on the quality information that is available for that dataset. • Second online questionnaire-based survey to identify the designs that convey quality information to users in most efficient and comprehensible way. • Currently: • At this stage we are analysing the GEO label study II results to fully define and establish a GEO label that meets the needs of the geodata user community. • Phase III: we will create physical prototypes which will be used in a human subject study. More information: Victoria Lush [lushv@aston.ac.uk]

  27. The future • Many possibilities has been shown. • Now the project enters in a development phase where the concepts exposed and prototypes need to be developed. • Move the GeoViQua Quality Model for a broader adoption. • Develop a user feedback system prototype. • Test search and visualization developments in a GEO Portal replica (ESA contribution) • Work with the Architecture GEO committees to move some of this contribution for adoption in the GCI.

  28. GeoViQua: Advances in data quality disclosing Thanks! Ivette Serral Center of Research in Ecology and Forestry Applications (CREAF) contact@geoviqua.org

More Related