300 likes | 567 Views
GIS Data Quality. Producing better data quality through robust business processes. BrightStar TRAINING. Kim Ollivier. Schedule Day 2. Suggested breaks for the following times: Start: 9:00 Session 1 ( 90 min) Morning tea: 10:30 to 10:45 Session 2 ( 105 min)
E N D
GIS Data Quality Producing better data quality through robust business processes BrightStar TRAINING Kim Ollivier
Schedule Day 2 Suggested breaks for the following times: Start: 9:00 Session 1 ( 90 min) Morning tea: 10:30 to 10:45 Session 2 ( 105 min) Lunch: 12:30 to 1:30 Session 3 ( 90 min) Afternoon tea: 3:00 to 3:15 Session 4 ( 105 min) Finish: 5:00 Each session will have an exercise or interactive discussion
Topics • Metadata • Designing rules • Data warehouse and ETL • Feature maintenance
Metadata • Data model • Business rules, relations, state • Subclasses (lookup tables) • GIS Metadata NZGLS and ISO XML • Readme.txt or readme.html
Metadata • Which standard? • ISO 19115, NZGMS • Aust asdd.ga.gov.au
Examine Metadata • Geospatial metadata • Benefit to users or producer? • How do we collect it? • Standardisation or not? • metadata\topo250k_metadata.html • metadata\DCW_DQ_Project.htm • metadata\meta.html Morning Tea
Data Quality Rules • Attribute domain constraints • Relational integrity rules • Rules for historical data • Rules for state-dependent objects • General dependency rules • Spatial feature rules
A GIS Data Quality System Assess Data Quality Assessment Data Profiling Improve Recognise Prevent Data Cleaning Monitoring Data Integration Interfaces Ensuring Quality of Data Conversion and Consolidation Building Data Quality Metadata Warehouse Monitor Recurrent Data Quality Assessment
Assessing Quality • Project steps • Required roles • Defining the objectives • Designing rules • Scorecard and Metadata • Frequency of assessment
Building Rules • Data profiling • Interview users • Examine data model • Data Gazing • Application v data matrix
Attribute Domain Constraints • Lookup tables • Numeric ranges • Null values • Blank values • Format constraints • Precision • Complex domain restraints
Relational Integrity Rules • Identity rule • Reference rules • Cardinal rules • Inheritance rules
Historical Data • Time dependent attribute • Value constraints • Rates of change • Volatility • Continuity • Granularity
State-dependent Objects start • State-transition models • States, terminators • Actions Active (A) On Leave (L) Retired (R) Terminated (T) Deceased (D)
Event Histories • An object may have many events • Event Overlaps • Event Frequencies • Event Conditions
Spatial Rules • Projection, units • Dimensions 2D,3D,M,Z • point,line,poly • Precision • Topology
Valuation Roll • Legacy structure, 50 years old • Variable maintenance standard • Valuer General audit (DQ spec)
Rules Exercise • Split into pairs • Examine sample DVR dataset • Devise some rules for each category • Verbal discussion with class Lunch
Data Warehouse & ETL • Why not direct access to online DB? • Staging Area • Scripting tools • Trade-offs • KPI for project • better quality than source • better quality than target
ETL Extract • Extract
ETL Transform • The importance of primary keys
ETL Load • Batch offline most common • Daily status usually enough
Safe Software FME • Examples Afternoon Tea
Data Quality Team IT DQ Team Users
Maintenance of features • Time series important • Line/polygon features are not atomic • Splitting loses inheritance • Calculating depreciation • Direct editing bypasses business rules
Maintenance of the Quality • Gardening, not mountain climbing • Discussion of course topics
References • Data Quality, The Accuracy Dimension – Jack E Olson • The Data Warehouse ETL Toolkit – Ralph Kimball Please fill in evaluation forms Finish