730 likes | 740 Views
Explore the various sources of uncertainty in geospatial data, including missing data, measurement errors, and uncertain definitions, and learn how to effectively analyze and interpret geospatial data.
E N D
Thinking Critically about Geospatial Data Quality Michael F. Goodchild University of California Santa Barbara
Starting points • All geospatial data leave the user to some extent uncertain about the state of the real world • missing data • positional and attribute errors • uncertain definitions of classes and terms • missing metadata • projection unspecified • horizontal or vertical datum • cartographic license
Uncertainty is endemic • All geographic information leaves some degree of uncertainty about conditions in the real world • x • the Greenwich Meridian • the Equator • standard time • z • definitions of terms • errors of measurement
Starting points • Some applications will be impacted, some will not • for any given data set at least one application can be found where uncertainty matters • knowing whether data are fit for use • Our perspective on these issues has changed in the past two decades • from error to uncertainty • from top-down to bottom-up data production
Definitions of terms • Precision: • the number of significant digits used to report a measurement • should never be more than is justified by the accuracy of the measuring device • internal precision of the computer • single precision arithmetic 1 part in 107 • double precision 1 part in 1014 • relative to the Earth's dimensions, single precision is about a meter resolution, double is about the size of an atom • no GIS should ever need more than single precision • a GIS's internal precision is effectively infinite • hard to persuade designers to drop those spurious digits
Scale • Relationship between measurement of distance on a map and measurement on the ground • 1:24,000 is larger than 1:250,000 • A map used for digitizing or scanning always has a scale • a geospatial database never has scale • but we have complex conventions • Scale as a useful surrogate for map contents, resolution, accuracy • scale of original map a useful item of geospatial metadata
Resolution • The minimum distance over which change isr recorded • 0.5mm for a paper map • Positional accuracy • set by national map accuracy standards to roughly 0.5mm
Accuracy • The difference between a measurement and the truth • problems of defining truth for some geospatial variables, e.g. soil class • if two people classified the same site would they agree? • Uncertainty of definition is a form of inaccuracy, along with: • variation between observers or measuring instruments • temporal change • loss of information on e.g. projection, datum • transformation of datum • map registration • digitizing error • imperfect fit of the data model, e.g. heterogeneous polygons, transition zones instead of boundaries • fuzziness of many geographic concepts • transformation of coordinate system, projection, data model, e.g. raster/vector conversion
Truth • Often a source of higher accuracy • circularity - accuracy is the difference between a measurement and a source of higher accuracy? • What identifies a source as having higher accuracy? • larger scale • more recent • cost more, took longer to make, more careful • more accurate measuring instrument • certified by an expert • earlier in the chain reality-map-database (less processing)
The problem • Uncertainty is endemic in geospatial data • even a 1:1 mapping would not create a perfect representation of reality • All GIS products are therefore subject to uncertainty • what is the plus or minus on estimates of length, area, counts of objects, positions, attributes, viewsheds, buffer zones, ... • GIS products are often used in decision-making by people who do not have intimate knowledge of the methods used to collect, digitize or process the data • results are often presented and used visually rather than numerically • Computer (GIS) output carries a false sense of credibility
Topology vs geometry • Which property does this pole lie in? • Which side of the street is this house? • Do these two streets connect?
Data types • The area-class map • soil maps • vegetation cover type • land use • Boundaries surrounding areas of uniform type • what’s the accuracy issue?
The area class map • Assigns every location x to a class • Mark and Csillag term • c = f(x) • a nominal field (or perhaps ordinal) • classified scene • soil map, vegetation cover map, land use map • Need to model uncertainty in this type of map
Uncertainty modeling • Area-class maps are made by a long and complex process involving many stages, some partially subjective • Maps of the same theme for the same area will not be the same • level of detail, generalization • vague definitions of classes • variation among observers • measuring instrument error • different classifiers, training sites • different sensors
Error and uncertainty • Error: true map plus distortion • systematic measurements disturbed by stochastic effects • accuracy (deviation from true value) • precision (deviation from mean value) • variation ascribed to error • Uncertainty: differences reflect uncertainty about the real world • no true map • possible consensus map • combining maps can improve estimates
Models of uncertainty • Determine effects of uncertainty/variation/error on results of analysis • if there is known variation, the results of a single analysis cannot be claimed to be correct • uncertainty analysis an essential part of GIS • error model the preferred term
Traditional error analysis • Measurements subject to distortion • z' = z + z • Propagate through transformations • r = f(z) • r + r = f(z + z) • But f is rarely known • complex compilation and interpretation • complex spatial dependencies between elements of resulting data set
Spatial dependence • In true values z • In errors e • cov(ei,ej) a decreasing positive function of distance • geostatistical framework • Scale effects, generalization as convolutions of z
If this were not true • If Tobler’s First Law of Geography did not apply to errors in maps • If errors were statistically independent • If relative errors were as large as absolute errors • Errors in derived products would be impossibly large • e.g. slope • e.g. length • Shapes would be unrecognizable
Realization • A single instance from an error model • an error model must be stochastic • Monte Carlo simulation • The Gaussian distribution metaphor • scalar realizations • a Gaussian distribution for maps • an entire map as a realization
Model • {p1,p2,…,pn} • correlation in neighboring cell outcomes • posterior probabilities equal to priors • 80% sand, 20% inclusions of clay • no knowledge of correlations
Topographic data • Definition problems • sand dunes • trees • buildings • Classic measurement error model • measured elevation = truth + error • error spatially autocorrelated