240 likes | 379 Views
Managing Error, Accuracy, and Precision In GIS. Importance of Understanding Error. *Until recently, most people involved with GIS paid little attention to error *That situation has now changed dramatically
E N D
Managing Error, Accuracy, and Precision In GIS
Importance of Understanding Error *Until recently, most people involved with GIS paid little attention to error *That situation has now changed dramatically *Error management is a vital role to the proper functioning of a GIS database, and is subject to a large percentage of work in most GIS shops
Importance of Understanding Error *The key point is that through awareness, scrutiny, and careful planning can minimize these errors and their associated effects on management and decision-making
Definitions for Understanding Error *Accuracy: the degree to which information on a map or in a digital database matches the true or accepted values -can vary greatly amongst datasets -very high accuracy can be expensive *Precision: refers to the level of measurement and “exactness” of description
Definitions for Understanding Error *Precision: refers to the level of measurement and “exactness” of description in a GIS -again, precision requirements vary greatly depending on the dataset -highly precise data can be much more expensive to create
Definitions for Understanding Error Accuracy vs. Precision . . .
Types of Error Positional Accuracy and Precision *Refers to both horizontal and vertical positions *Don’t use/compute locational information at a level beyond which the data was intended Accuracy Standards for US NTS Maps 1:1,200 ± 3.33 feet 1:2,400 ± 6.67 feet 1:4,800 ± 13.33 feet 1:10,000 ± 27.78 feet 1:12,000 ± 33.33 feet 1:24,000 ± 40.00 feet 1:63,360 ± 105.60 feet 1:100,000 ± 166.67 feet
Types of Error Attribute Accuracy and Precision *Attribute (non-spatial) information can also be erroneous *Some layers can be more precise than others Conceptual Accuracy and Precision *Use of inappropriate categories, or misclassification *Ex.-not classifying voltage in your power lines layer would limit your ability to manage electrical utilities infrastructure
Sources of Error *Sources of error can be divided into three groups: -obvious sources of error -errors resulting from natural variations or from original measurements -errors arising through processing
Obvious Sources of Error *Age of Data -some data sources may be too old to be useful -past collection standards may no longer be acceptable -the database could have changed dramatically over time (erosion/deposition, harvest, fire) -updating a database is by far the most common form of error management work
Obvious Sources of Error *Areal Cover -some datasets contain only part of the required information (veg., soils are common) -ex. FRI often contains no land cover information for wetland areas -some remote sensing data may be difficult to acquire consistently cloudy regions
Obvious Sources of Error *Map Scale -always remember the implications of scale!!!! *Density of Observations -an insufficient number of observations may not provide the required level of resolution -ex. If you have a 40’ contour interval, you had better not be reporting on or making decisions about features only a few feet in difference
Obvious Sources of Error *Relevance -surrogate data may be used to indirectly describe/classify/quantify features -Ex. We can create a forest polygon layer from classification of remotely sensed data. However, we are not classifying a “tree” as a tree. Rather, we are classifying the imagery based on spectral signatures, and those signatures can be related to tree species.
Obvious Sources of Error *Format -methods of formatting data can introduce errors -conversion of scale, projection, or datum, vectorization/rasterization, and pixel resolution are possible areas of format error -international mapping standards not established
Obvious Sources of Error *Accessibility -try getting a highway map of the former USSR in the Cold War days . . . Good Luck! *Cost -highly accurate, precise data is expensive!!!
Errors from Natural Variationor from Original Measurements *Positional Accuracy -many natural features do not exhibit “hard” boundaries like roads or boundary lines -examples include . . .?
Errors from Natural Variationor from Original Measurements *Positional Accuracy -many natural features do not exhibit “hard” boundaries like roads or boundary lines -examples include: -soils -vegetation communities -climate variables -drainage -biomes, etc.
Errors from Natural Variationor from Original Measurements *Accuracy of Content -qualitative accuracy refers to correct labelling/classification (Ex.-pine forest vs. spruce forest) -quantitative inaccuracies often occur from faulty equipment or poor readings -what forestry equipment could give you bad data? And how?
Errors Arising Through Processing *Numerical Errors -by far, the hardest errors to detect!!! -different (faulty) computer chips can compute differently, generating a different output (response) *Topological Errors -overlaying, or deriving/creating new variables based on other data can cause slivers, overshoots, and dangles
Errors Arising Through Processing *Classification/Generalization Errors -classification inaccuracies/class merging -grouping data in different ways can lead to dramatically different results (Ex.-studying cause of death amongst males would probably be quite different if you had (amongst others) an aged 18-25 group vs. an 18-50 group
Errors Arising Through Processing *Geocoding/Digitizing Errors -what can cause digitizing errors?
Errors Arising Through Processing *Geocoding/Digitizing Errors -what can cause digitizing errors? -rasterizing will cause positional error
Error, Error, Everywhere . . . How can we manage error? 1. Be aware of where error can be generated (everything discussed in this presentation) 2. Metadata, metadata, metadata . . . Fully understand all data compiled for your GIS, make notes of all work done with the data, and send such information to future users or with all GIS generated output.