1 / 73

Thinking Critically about Geospatial Data Quality

Explore the various sources of uncertainty in geospatial data, including missing data, measurement errors, and uncertain definitions, and learn how to effectively analyze and interpret geospatial data.

ruthmontes
Download Presentation

Thinking Critically about Geospatial Data Quality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thinking Critically about Geospatial Data Quality Michael F. Goodchild University of California Santa Barbara

  2. Starting points • All geospatial data leave the user to some extent uncertain about the state of the real world • missing data • positional and attribute errors • uncertain definitions of classes and terms • missing metadata • projection unspecified • horizontal or vertical datum • cartographic license

  3. Uncertainty is endemic • All geographic information leaves some degree of uncertainty about conditions in the real world • x • the Greenwich Meridian • the Equator • standard time • z • definitions of terms • errors of measurement

  4. Starting points • Some applications will be impacted, some will not • for any given data set at least one application can be found where uncertainty matters • knowing whether data are fit for use • Our perspective on these issues has changed in the past two decades • from error to uncertainty • from top-down to bottom-up data production

  5. http://www.thesalmons.org/lynn/wh-greenwich.html

  6. Definitions of terms • Precision: • the number of significant digits used to report a measurement • should never be more than is justified by the accuracy of the measuring device • internal precision of the computer • single precision arithmetic 1 part in 107 • double precision 1 part in 1014 • relative to the Earth's dimensions, single precision is about a meter resolution, double is about the size of an atom • no GIS should ever need more than single precision • a GIS's internal precision is effectively infinite • hard to persuade designers to drop those spurious digits

  7. Scale • Relationship between measurement of distance on a map and measurement on the ground • 1:24,000 is larger than 1:250,000 • A map used for digitizing or scanning always has a scale • a geospatial database never has scale • but we have complex conventions • Scale as a useful surrogate for map contents, resolution, accuracy • scale of original map a useful item of geospatial metadata

  8. Resolution • The minimum distance over which change isr recorded • 0.5mm for a paper map • Positional accuracy • set by national map accuracy standards to roughly 0.5mm

  9. Accuracy • The difference between a measurement and the truth • problems of defining truth for some geospatial variables, e.g. soil class • if two people classified the same site would they agree? • Uncertainty of definition is a form of inaccuracy, along with: • variation between observers or measuring instruments • temporal change • loss of information on e.g. projection, datum • transformation of datum • map registration • digitizing error • imperfect fit of the data model, e.g. heterogeneous polygons, transition zones instead of boundaries • fuzziness of many geographic concepts • transformation of coordinate system, projection, data model, e.g. raster/vector conversion

  10. Truth • Often a source of higher accuracy • circularity - accuracy is the difference between a measurement and a source of higher accuracy? • What identifies a source as having higher accuracy? • larger scale • more recent • cost more, took longer to make, more careful • more accurate measuring instrument • certified by an expert • earlier in the chain reality-map-database (less processing)

  11. The problem • Uncertainty is endemic in geospatial data • even a 1:1 mapping would not create a perfect representation of reality • All GIS products are therefore subject to uncertainty • what is the plus or minus on estimates of length, area, counts of objects, positions, attributes, viewsheds, buffer zones, ... • GIS products are often used in decision-making by people who do not have intimate knowledge of the methods used to collect, digitize or process the data • results are often presented and used visually rather than numerically • Computer (GIS) output carries a false sense of credibility

  12. Topology vs geometry • Which property does this pole lie in? • Which side of the street is this house? • Do these two streets connect?

  13. Data types • The area-class map • soil maps • vegetation cover type • land use • Boundaries surrounding areas of uniform type • what’s the accuracy issue?

  14. The area class map • Assigns every location x to a class • Mark and Csillag term • c = f(x) • a nominal field (or perhaps ordinal) • classified scene • soil map, vegetation cover map, land use map • Need to model uncertainty in this type of map

  15. Uncertainty modeling • Area-class maps are made by a long and complex process involving many stages, some partially subjective • Maps of the same theme for the same area will not be the same • level of detail, generalization • vague definitions of classes • variation among observers • measuring instrument error • different classifiers, training sites • different sensors

  16. Error and uncertainty • Error: true map plus distortion • systematic measurements disturbed by stochastic effects • accuracy (deviation from true value) • precision (deviation from mean value) • variation ascribed to error • Uncertainty: differences reflect uncertainty about the real world • no true map • possible consensus map • combining maps can improve estimates

  17. Models of uncertainty • Determine effects of uncertainty/variation/error on results of analysis • if there is known variation, the results of a single analysis cannot be claimed to be correct • uncertainty analysis an essential part of GIS • error model the preferred term

  18. Traditional error analysis • Measurements subject to distortion • z' = z + z • Propagate through transformations • r = f(z) • r + r = f(z + z) • But f is rarely known • complex compilation and interpretation • complex spatial dependencies between elements of resulting data set

  19. Spatial dependence • In true values z • In errors e • cov(ei,ej) a decreasing positive function of distance • geostatistical framework • Scale effects, generalization as convolutions of z

  20. If this were not true • If Tobler’s First Law of Geography did not apply to errors in maps • If errors were statistically independent • If relative errors were as large as absolute errors • Errors in derived products would be impossibly large • e.g. slope • e.g. length • Shapes would be unrecognizable

  21. Realization • A single instance from an error model • an error model must be stochastic • Monte Carlo simulation • The Gaussian distribution metaphor • scalar realizations • a Gaussian distribution for maps • an entire map as a realization

  22. Model • {p1,p2,…,pn} • correlation in neighboring cell outcomes • posterior probabilities equal to priors • 80% sand, 20% inclusions of clay • no knowledge of correlations

  23. Topographic data • Definition problems • sand dunes • trees • buildings • Classic measurement error model • measured elevation = truth + error • error spatially autocorrelated

More Related