370 likes | 509 Views
EuroRegionalMap: Best practices in quality assessment for a pan-European dataset. Nathalie Delattre QKEN meeting, Brussels, 5-7 may 2010. Items. ERM: presentation Best Practices in quality control Quality issues Expectation-Debate. 1. ERM: presentation.
E N D
EuroRegionalMap: Best practices in quality assessment for a pan-European dataset Nathalie Delattre QKEN meeting, Brussels, 5-7 may 2010
Items ERM: presentation Best Practices in quality control Quality issues Expectation-Debate
Project status: consolidation phase 2007-2010 Eurostat Contract to provide a yearly update of the ERM data for a European coverage in accordance with the EC contract No. 2006/S 174-185902 to improve the level of harmonisation of ERM in the data content and selection criteria To upgrade ERM according to EUROSTAT specifications orientated for spatial analysis purpose
ERM level of progress • Release 2.2 (Jan 2008) • 31 countries : EU26, 4 EFTA, Moldova • Croatia: administrative boundaries • Release 3.0 (Jan 2009) + Croatia : Railway network + Isle of Man + Faeroe Islands • No update or improvement from Italy ( VMap data sources) • No data from Bulgaria • Release 3.1 (Jan 2010)Adm, transports, settlements, names • Release 3.2 ( Dec 2010) • Hydro + Bulgaria
Production work flow Deliverables: national component of the ERM data : draft version (GDB or shapefiles) Validation report national component of the ERM data : draft version Validation report National components of the ERM data: final version Metadata + lineage files Final reception ( sending approval ) Production phase Countries Data Production Own Quality Control Validation phase RC Quality Control Countries Corrections and Edge Matching RC Quality Control ( also on edge-matching Countries Last corrections
Integration phase Task: Finishing the edge-matching at cross border area by integrating the duplicated features located on international boundaries into one single feature Deliverables: ERM data set in File GDB Metadata for ERM Task: Adding land mask feature Merging the ferry lines into a seamless and consistent network usable for spatial analysis Setting up UIC code for railways Deliverables: ERM data set in File GDB, fit for EC Metadata for Eurostat Quality assessment report Integration phase PM Data integration into a seamless coverage PM Specific processes for specific features asked by EC
2. Quality control : best practices for a pan-European dataset
Quality control Validation process : checking the conformity with the ERM specifications Quality assessment process: reporting on data content and data harmonisation in selection criteria
Minimum Requirements Validation specifications • Compliance with the ERM Specifications • Data model • Topology • Allowed attribute values • Selection criteria • Geometrical resolution • Coherence and consistency of feature and attributes • Homogeneity of attribute values in a feature network • Consistency between themes • Cross-border continuity between neighbouring countries
Validation process • To ensure best data quality If errors exist Report about validation results ERM Data production Validation by producer Validation by RC
Validation deliverables Documentation: My ERM documentation D41_ERMSpecificationDC_v43.pdf D51_DataValidationSpecifications_V40.pdf D52_DataValidationSpecifications_MinReq_v12.pdf ICC_ERM_ValidationReport_template.xls ERM_v31_Validation_Tools_v10.xls …
Quality indicators in Metadata Metadata for discovery (standard ISO 19115) : ERM_Metadata_partners_template.xls Lineage files ( data quality) ERM Lineage Template.doc ERM_Lineage.xls
Quality indicators Existence (ID1) = presence/absence of feature or attribute Def: the feature or attribute information exists in the real world context and has been captured ( presence) or not captured (absence) in the ERM data set. Values: Presence : indicator ID1 = 1 Absence : indicator ID1 = 0 N_A: indicator = -1 ( the feature/attribute doesn’t exist in the real world context)
Existence for Austria • Hierarchical level 4 and 5 doesn’t exist in Austria ID1 = -1 Foreshore and coastline doesn’t exist in AustriaID1 = -1
Existence for Spain • Foreshore not entering in the selection criteria ID1 = -1 Shoreline exist but have not been captured : ID1 = 0
Quality indicators (2) Completeness (ID2) group of indicators Selection compliancy (ID2.1) for features Data Completeness (ID2.2) for attributes Selection compliancy : features are captured for the entire territory and in accordance to the portrayal and selection criteria of the specifications Values ID2.1 = 1 ( fully compliant) ID2.1 = 0 ( not fully compliant)
Quality indicators (3) Completeness (ID2) group of indicators Selection compliancy (ID2.1) for features Data Completeness (ID2.2) for attributes Data Completeness : % of the populated attributes holding real values ( null values like UNK or N_P are not considered) Value: % Ex: value for RTN Number of features with RTN <> [UNK] = 34000 Number of total features = 45000 ID2.2 = [ROUNDUP (34000/45000) * 100] = 76%
Python Scripts in ERM Toolbox • Edgematching • Check Edgematching for lines • Check Edgematching for points • ERM QC • Check Multipart • Feature Statistics • Item Statistics • Populate Symbol Number • Summary Statistics • Test ASCII fields • Export • Export to Shape
Statistics tools Feature Statistics the number of features / featureclasse use: QA - presence of feature classes and country codes supports to fill the metadata (lineage.doc)
Statistics tools • AllStatistics • ID1= the existence of the feature and attribute {0,1} • ID2 = the completeness of the feature and attribute {0,..,100} • use: supports to fill the metadata (lineage.xls)
Statistics tools GeomStat the number of the features per unit Area (10km2, 100km2, etc.) use: QA – density of features -> base for harmonization of selection criteria between countries WatrcrsL (Natural) 10 km CZ 10 km 15 3 SK MD HU 12 32 RO
Statistics tools GeomStat
Geometry tools MinVertexDistance check the minimum allowed distance between vertices (50 m) use: QA - data quality requirements 46 m Correction needed ! WatrcrsL
Quality requirements Compliancy with a standard (ERM specifications) Topological errors usable topological network 3 Completeness in attributes Ex : Name completions 4 Data harmonisation between countries in selection criteria in classification in geometrical accuracy ( vertices density)
Quality issues : Transport Heterogeneity in national classification of the roads ( primary secondary, etc..)
Quality issues: Hydro • Heterogeneity in selection criteria
Quality issues: Hydro • Name completion (selected in blue the non-named rivers)
Quality issues:Hydro • River hierachical level : must be consistent at European level ( in blue rivers with national hirerachical level)
Expectations Need of a quality control manager Assess quality of the data Suggest new methodology and improvement in Quality control tools Provide a quality assessment report of each release ESDIN framework (the near future for ERM): what kind of quality data model for the pan-European products What kind of validation tools and quality control ? Commitment of the Quality KEN ? Support welcome, which kind?
Debate : quality data model? For which kind of data? Quality control applicable to base level datasets Related to real world phenomena Quality control applicable to generalised and derived datasets ( at medium scale level)? Added factor of selection criteria Quality control applicable to pan-European datasets? Added factor of harmonisation between countries.