1 / 37

EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

EuroRegionalMap: Best practices in quality assessment for a pan-European dataset. Nathalie Delattre QKEN meeting, Brussels, 5-7 may 2010. Items. ERM: presentation Best Practices in quality control Quality issues Expectation-Debate. 1. ERM: presentation.

Download Presentation

EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EuroRegionalMap: Best practices in quality assessment for a pan-European dataset Nathalie Delattre QKEN meeting, Brussels, 5-7 may 2010

  2. Items ERM: presentation Best Practices in quality control Quality issues Expectation-Debate

  3. 1. ERM: presentation

  4. Project status: consolidation phase 2007-2010 Eurostat Contract to provide a yearly update of the ERM data for a European coverage in accordance with the EC contract No. 2006/S 174-185902 to improve the level of harmonisation of ERM in the data content and selection criteria To upgrade ERM according to EUROSTAT specifications orientated for spatial analysis purpose

  5. Evolution of ERM towards EC requirements

  6. ERM level of progress • Release 2.2 (Jan 2008) • 31 countries : EU26, 4 EFTA, Moldova • Croatia: administrative boundaries • Release 3.0 (Jan 2009) + Croatia : Railway network + Isle of Man + Faeroe Islands • No update or improvement from Italy ( VMap data sources) • No data from Bulgaria • Release 3.1 (Jan 2010)Adm, transports, settlements, names • Release 3.2 ( Dec 2010) • Hydro + Bulgaria

  7. Production work flow Deliverables: national component of the ERM data : draft version (GDB or shapefiles) Validation report national component of the ERM data : draft version Validation report National components of the ERM data: final version Metadata + lineage files Final reception ( sending approval ) Production phase Countries Data Production Own Quality Control Validation phase RC Quality Control Countries Corrections and Edge Matching RC Quality Control ( also on edge-matching Countries Last corrections

  8. Integration phase Task: Finishing the edge-matching at cross border area by integrating the duplicated features located on international boundaries into one single feature Deliverables: ERM data set in File GDB Metadata for ERM Task: Adding land mask feature Merging the ferry lines into a seamless and consistent network usable for spatial analysis Setting up UIC code for railways Deliverables: ERM data set in File GDB, fit for EC Metadata for Eurostat Quality assessment report Integration phase PM Data integration into a seamless coverage PM Specific processes for specific features asked by EC

  9. 2. Quality control : best practices for a pan-European dataset

  10. Quality control Validation process : checking the conformity with the ERM specifications Quality assessment process: reporting on data content and data harmonisation in selection criteria

  11. Minimum Requirements Validation specifications • Compliance with the ERM Specifications • Data model • Topology • Allowed attribute values • Selection criteria • Geometrical resolution • Coherence and consistency of feature and attributes • Homogeneity of attribute values in a feature network • Consistency between themes • Cross-border continuity between neighbouring countries

  12. Validation process • To ensure best data quality If errors exist Report about validation results ERM Data production Validation by producer Validation by RC

  13. Validation deliverables Documentation: My ERM documentation D41_ERMSpecificationDC_v43.pdf D51_DataValidationSpecifications_V40.pdf D52_DataValidationSpecifications_MinReq_v12.pdf ICC_ERM_ValidationReport_template.xls ERM_v31_Validation_Tools_v10.xls …

  14. Quality indicators in Metadata Metadata for discovery (standard ISO 19115) : ERM_Metadata_partners_template.xls Lineage files ( data quality) ERM Lineage Template.doc ERM_Lineage.xls

  15. Quality indicators Existence (ID1) = presence/absence of feature or attribute Def: the feature or attribute information exists in the real world context and has been captured ( presence) or not captured (absence) in the ERM data set. Values: Presence : indicator ID1 = 1 Absence : indicator ID1 = 0 N_A: indicator = -1 ( the feature/attribute doesn’t exist in the real world context)

  16. Existence for Austria • Hierarchical level 4 and 5 doesn’t exist in Austria ID1 = -1 Foreshore and coastline doesn’t exist in AustriaID1 = -1

  17. Existence for Spain • Foreshore not entering in the selection criteria ID1 = -1 Shoreline exist but have not been captured : ID1 = 0

  18. Quality indicators (2) Completeness (ID2) group of indicators Selection compliancy (ID2.1) for features Data Completeness (ID2.2) for attributes Selection compliancy : features are captured for the entire territory and in accordance to the portrayal and selection criteria of the specifications Values ID2.1 = 1 ( fully compliant) ID2.1 = 0 ( not fully compliant)

  19. Quality indicators (3) Completeness (ID2) group of indicators Selection compliancy (ID2.1) for features Data Completeness (ID2.2) for attributes Data Completeness : % of the populated attributes holding real values ( null values like UNK or N_P are not considered) Value: % Ex: value for RTN Number of features with RTN <> [UNK] = 34000 Number of total features = 45000 ID2.2 = [ROUNDUP (34000/45000) * 100] = 76%

  20. Example: Completeness for road and island

  21. Metadata on not provided information

  22. Quality tools

  23. Python Scripts in ERM Toolbox • Edgematching • Check Edgematching for lines • Check Edgematching for points • ERM QC • Check Multipart • Feature Statistics • Item Statistics • Populate Symbol Number • Summary Statistics • Test ASCII fields • Export • Export to Shape

  24. Statistics tools Feature Statistics the number of features / featureclasse use: QA - presence of feature classes and country codes supports to fill the metadata (lineage.doc)

  25. Statistics tools • AllStatistics • ID1= the existence of the feature and attribute {0,1} • ID2 = the completeness of the feature and attribute {0,..,100} • use: supports to fill the metadata (lineage.xls)

  26. Statistics tools GeomStat the number of the features per unit Area (10km2, 100km2, etc.) use: QA – density of features -> base for harmonization of selection criteria between countries WatrcrsL (Natural) 10 km CZ 10 km 15 3 SK MD HU 12 32 RO

  27. Statistics tools GeomStat

  28. Geometry tools MinVertexDistance check the minimum allowed distance between vertices (50 m) use: QA - data quality requirements 46 m Correction needed ! WatrcrsL

  29. Quality issues

  30. Quality requirements Compliancy with a standard (ERM specifications) Topological errors usable topological network 3 Completeness in attributes Ex : Name completions 4 Data harmonisation between countries in selection criteria in classification in geometrical accuracy ( vertices density)

  31. Quality issues : Transport Heterogeneity in national classification of the roads ( primary secondary, etc..)

  32. Quality issues: Hydro • Heterogeneity in selection criteria

  33. Quality issues: Hydro • Name completion (selected in blue the non-named rivers)

  34. Quality issues:Hydro • River hierachical level : must be consistent at European level ( in blue rivers with national hirerachical level)

  35. Expectations

  36. Expectations Need of a quality control manager Assess quality of the data Suggest new methodology and improvement in Quality control tools Provide a quality assessment report of each release ESDIN framework (the near future for ERM): what kind of quality data model for the pan-European products What kind of validation tools and quality control ? Commitment of the Quality KEN ? Support welcome, which kind?

  37. Debate : quality data model? For which kind of data? Quality control applicable to base level datasets Related to real world phenomena Quality control applicable to generalised and derived datasets ( at medium scale level)? Added factor of selection criteria Quality control applicable to pan-European datasets? Added factor of harmonisation between countries.

More Related