290 likes | 413 Views
Beyond Metadata: Towards User-Centric Description of Data Quality. Michael F. Goodchild University of California Santa Barbara. Metadata. Data about data handling instructions catalog entry fitness for use What is known about data quality
E N D
Beyond Metadata: Towards User-Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara
Metadata • Data about data • handling instructions • catalog entry • fitness for use • What is known about data quality • a measure of the success of spatial data quality research • much progress has been made • FGDC CSDGM 1994 • ISO 19115 2003 • DDI • EML
Two tests of success • Geobrowsers • Google Earth • geotagging • Wikimapia • Where 2.0
CSDGM, ISO 19115 • Do they match the state of research? • early 1990s • SDTS discussions of 1980s • the five-fold way • positional accuracy • attribute accuracy • logical consistency • completeness • lineage • Do they represent a user perspective? • committees staffed by data producers • production control mechanisms?
Producer or user? • Producer-centric • details of the production process: the measurement and compilation systems used • tests of data quality conducted under carefully controlled conditions • formal specifications of data set contents • User-centric • effects of uncertainties on specific uses of the data, from simple queries to complex analyses • simple descriptions of quality that are readily understood by non-expert users • tools to enable the user to determine the effects of quality on results
Increasing complexity • Self-documentation • notes to oneself • A colleague • brief description • Another discipline, language, culture • ideal metadata/data ratio?
complexity of metadata social distance
Seven issues • Areas in which research has moved beyond the standards • Accuracy of Spatial Databases 1989 • Measurements from Maps 1989 • 15 books • 1000 journal articles
1. Decoupling the representative fraction • Ratio of distance on the map to distance on the ground • no flat map of a curved surface can have a constant RF • RF as a surrogate • positional accuracy • spatial resolution • map content • RF undefined for digital data • inherited from source maps • extended by convention • aerial photographs (RF of the photographic plate) • digital orthoimagery (positional accuracy)
2. Accuracy or uncertainty? • Accuracy • a true value z exists • a measured value z* • error z*-z • RMSE • theory of measurement error • error propagation • Uncertainty • vagueness in definitions • no truth • perhaps a consensus? • lack of replicability • Change of paradigm around 1992
3. Objects and fields • A fundamental distinction • 1992 • appears nowhere in the standards • Discrete object conceptualization • an empty table top • occupied by discrete, countable objects • points, lines, areas, volumes • Continuous field conceptualization • a mapping from location x to value z • a single-valued function of location
Separability • Phenomenon conceptualized as a field • impossible to separate positional and attribute accuracy • interval/ratio (elevation) • nominal (land cover class)
4. Granularity • Metadata definable at any level • individual vertex • point, line, area • layer • geodatabase • Metadata as a form of generalization • economies of scale • Spatial non-stationarity • Multiple lineages
5. Collection-level metadata • Describing the properties of entire collections • The Geospatial One-Stop • www.geodata.gov • There will always be more than one one-stop • how to know where to look?
6. Spatial dependence • Tobler’s First Law • nearby things are more similar than distant things • applies to errors • relative accuracy almost always better than absolute accuracy • covariances as important as variances
Marginal or joint properties? • Visualization of marginal properties • Analytic functions respond to joint properties • slope • area • Joint properties must be described at a higher level • relative errors of vertex positions • described at level of vertex collection
Cross-correlation • How are errors on Layer 1 related to errors on Layer 2? • Error as an issue in interoperability • what happens if I superimpose these layers? • Two layers will almost always not fit • depends on lineage of each • how bad is the misfit? • will it affect my analysis? • Binary metadata • the ability of a pair of data sets to interoperate • not available from either’s unary metadata • If GIS is about overlay • then binary metadata are essential
The way forward • Reopen the metadata debate • an unpopular move • it’s hard enough to persuade people to provide metadata • a standard before its time • standards should emerge only after research is complete • It’s our responsibility • the research task does not end with journal publication • metadata standards express the state of our research • Many other issues not related to data quality • possible allies