1 / 28

November 1 Uncertainty, error and sensitivity Reading: Longley:Ch 15

Data Quality Measurement and Assessment1. Data QualityWhat is quality? Quality is commonly used to indicate the superiority of a manufactured good or to indicate a high degree of craftsmanship or artistry. We might define it as the degree of excellence in a product, service or performance. In ma

ferrol
Download Presentation

November 1 Uncertainty, error and sensitivity Reading: Longley:Ch 15

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. November 1 Uncertainty, error and sensitivity Reading: Longley:Ch 15

    2. Data Quality Measurement and Assessment 1. Data Quality What is quality? Quality is commonly used to indicate the superiority of a manufactured good or to indicate a high degree of craftsmanship or artistry. We might define it as the degree of excellence in a product, service or performance. In manufacturing, quality is a desirable goal achieved through management and control of the production process (statistical quality control). (Redman, 1992) Many of the same issues apply to the quality of databases, since a database is the result of a production process, and the reliability of the process imparts value and utility to the database.

    3. Why is there a concern for DQ? Increased data production by the private sector, where there are no required quality standards. In contrast, production of data by national mapping agencies (e.g., US Geological Survey, British Ordnance Survey) has long been required to conform to national accuracy standards (i.e., mandated quality control). Increased use of GIS for decision support, such that the implications of using low-quality data are becoming more widespread (including the possibility of litigation if minimum standards of quality are not attained). Increased reliance on secondary data sources, due to the growth of the Internet, data translators and data transfer standards. Thus, poor-quality data is ever easier to get.

    4. Who assesses DQ? Model 1. Minimum Quality Standards. This is a form of quality control where DQ assessment is the responsibility of the data producer. It is based on compliance testing strategies to identify databases that meet quality thresholds defined a priori. An example is NMAS, the National Map Accuracy Standards adopted by the US Geological Survey in 1946. This approach lacks flexibility; in some cases a particular test may be too lax while in others it may be too restrictive.

    5. Model 2. Metadata Standards. This model views error as inevitable and does not impose a minimum quality standard a priori. Instead, it is the consumer who is responsible for assessing fitness-for-use; the producer’s responsibility is documentation, i.e., “truth-in-labeling.” An example is SDTS, the Spatial Data Transfer Standard. This approach is flexible, but there is still no feedback from the consumer, i.e., there is a one-way information flow that inhibits the producer’s ability to correct mistakes.

    6. Model 3. Market Standards. This model uses a two-way information flow to obtain feedback from users on data quality problems. Consumer feedback is processed and analyzed to identify significant problems and prioritize repairs. An example is Microsoft’s Feedback Wizard, a software utility that lets users email reports of map errors. This model is useful in a market context in order to ensure that databases match users’ needs and expectations.

    7. A better definition of geographical data includes the three dimensions of space, time and theme (where-when-what). These three dimensions are the basis for all geographical observation. (Berry, 1964; Sinton, 1978) Geographical data quality is likewise defined by space-time-theme. Data quality also contains several components such as accuracy, precision, consistency completeness.

    8. Accuracy Accuracy is the inverse of error. Many people equate accuracy with quality but in fact accuracy is just one component of quality Definition of accuracy is based on the entity-attribute-value model Entities = real-world phenomena Attribute = relevant property Values = Quantitative/qualitative measurements An error is a discrepancy between the encoded and actual value of a particular attribute for a given entity. “Actual value” implies the existence of an objective, observable reality. However, reality may be: Unobservable (e.g., historical data) Impractical to observe (e.g., too costly) Perceived rather than real (e.g., subjective entities such as “neighborhoods")

    9. Spatial Accuracy Spatial accuracy is the accuracy of the spatial component of the database. The metrics used depend on the dimensionality of the entities under consideration.  For points, accuracy is defined in terms of the distance between the encoded location and “actual” location. Error can be defined in various dimensions: x, y, z, horizontal, vertical, total Metrics of error are extensions of classical statistical measures (mean error, RMSE or root mean squared error, inference tests, confidence limits, etc.) (American Society of Civil Engineers 1983; American Society of Photogrammetry 1985; Goodchild 1991a). For lines and areas, the situation is more complex. This is because error is a mixture of positional error (error in locating well-defined points along the line) and generalization error (error in the points selected to represent the line) (Goodchild 1991b).

    10. Precision Surveyors and others concerned with measuring instruments tend to define precision through the performance of an instrument in making repeated measurements of the same phenomenon. A measuring instrument is precise according to this definition if it repeatedly gives similar measurements, whether or not these are actually accurate. So a GPS receiver might make successive measurements of the same elevation, and if these are similar the instrument is said to be precise. Precision in this case can be measured by the variability among repeated measurements.

    12. Consistency Consistency refers to the absence of apparent contradictions in a database. Consistency is a measure of the internal validity of a database, and is assessed using information that is contained within the database. Consistency can be defined with reference to the three dimensions of geographical data. Spatial consistency includes topological consistency, or conformance to topological rules, e.g., all one-dimensional objects must intersect at a zero-dimensional object (Kainz, 1995). Temporal consistency is related to temporal topology, e.g., the constraint that only one event can occur at a given location at a given time (Langran, 1992). Thematic consistency refers to a lack of contradictions in redundant thematic attributes. For example, attribute values for population, area, and population density must agree for all entities.

    13. Completeness Completeness refers to a lack of errors of omission in a database. It is assessed relative to the database specification, which defines the desired degree of generalization and abstraction (selective omission).

    14. Incompleteness can be measured in space, time or theme . Consider a database of buildings in Minnesota that have been placed on the National Register of Historic Places as of the end of 1995. Spatial incompleteness: The list contains only buildings in Hennepin County (one county in Minnesota, rather than all of Minnesota). Temporal incompleteness: The list contains only buildings placed on the Register by June 30, 1995. Thematic incompleteness: The list contains only residential buildings.

    15. MANAGING Uncertainty in GIS Increasingly, concern is being expressed within the geographic information industry at our inability to effectively deal with uncertainty and manage the quality of information outputs

    16. This concern has resulted from the requirement in some jurisdictions for mandatory data quality reports when transferring data the need to protect individual and agency reputations, particularly when geographic information is used to support administrative decisions subject to appeal the need to safeguard against possible litigation by those who allege to have suffered harm through the use of products that were of insufficient quality to meet their needs the basic scientific requirement of being able to describe how close their information is to the truth it represents

    17. We often forget that traditional hardcopy maps contained valuable forms of accuracy statements such as reliability diagrams and estimates of positional error Although these descriptors were imperfect, they at least represented an attempt by map makers to convey product limitations to map users, however this approach assumed a knowledge on the part of users as to how far the maps could be trusted, and new users of this information are often unaware of the potential traps that can lie in misuse of their data and the associated technology The lack of accuracy estimates for digital data has the potential to harm reputations of both individuals and agencies - particularly where administrative decisions are subject to judicial review

    18. The era of consumer protection also has an impact upon the issue While we would not think of purchasing a microwave oven or video recorder without an instruction booklet and a warranty against defects, it is still common for organizations to spend thousands of dollars purchasing geographic data without receiving any quality documentation Finally, if the collection, manipulation, analysis and presentation of geographic information is to be recognized as a valid field of scientific endeavor, then it is inappropriate that GIS users remain unable to describe how close their information is to the truth it represents

    20. A Strategy for Managing Uncertainty In developing a strategy for managing uncertainty we need to take into consideration the core components of the uncertainty research agenda. developing formal, rigorous models of uncertainty understanding how uncertainty propagates through spatial processing and decision making communicating uncertainty to different levels of users in more meaningful ways designing techniques to assess the fitness for use of geographic information and reducing uncertainty to manageable levels for any given application learning how to make decisions when uncertainty is present in geographic information, i.e. being able to absorb uncertainty and cope with it in our everyday lives

    22. Ideally, this prior knowledge permits an assessment of the final product quality specifications to be made before a project is undertaken, however this may have to be decided later when the level of uncertainty becomes known Data, software, hardware and spatial processes are combined to provide the necessary information products Assuming that uncertainty in a product is able to be detected and modeled, the next consideration is how the various uncertainties may best be communicated to the user Finally, the user must decide what product quality is acceptable for the application and whether the uncertainty present is appropriate for the given task.

    23. There are two choices available here either reject the product as unsuitable and select uncertainty reduction techniques to create a more accurate product or, absorb (accept) the uncertainty present and use the product for its intended purpose

    24. Determining Product Quality In determining the significant forms of uncertainty in a product, trade-offs in one area may have to be made at the expense of others to achieve better accuracy, for example it may be acceptable that adjacent objects - such as a road, a railway and a river - are shown in their correct relative positions even though they have been displaced for cartographic enhancement attribute accuracy or logical consistency may be the dominant considerations, such as in emergency dispatch applications where missing street addresses or non-intersecting road segments can cost lives

    25. A well-known case described by Blakemore (1985) provides a good example of the lack of understanding of geographic data accuracy requirements A file of boundary coordinates for British administrative districts was collected for a government department as a background for its thematic maps The agency did not require highly accurate positional recording of the boundaries and emphasis was placed more on attribute accuracy However, when the data became extremely popular because of its extent and convenient format, secondary users soon started to experience problems with point-in-polygon searches after some locations '... seemed suddenly to be 2 or 3 km out into the North Sea' Clearly, convenience can sometimes override data quality concerns

    26. Where the data quality requirements are unknown, it can be a useful exercise to estimate costs for each combination of data collection and conversion technologies and procedures users often take a pragmatic approach to the 'cost vs. accuracy' issue when forced do so Having decided which forms of uncertainty will have significant impact upon the quality of a product, some form of uncertainty assessment and communication must be undertaken

    27. Ultimately, however, it is the user who must decide whether the level of uncertainty present in a product is acceptable or not which implies there is really no such thing as bad data - just data that does not meet a specific need. For example a road centerline product digitized from source material at a scale of 1:25000 would probably have poor positional accuracy for an urban utility manager, yet may be quite acceptable for a marketing agency planning advertising catchments similarly, street addresses associated with the road centerline segments would probably need to be error free for an emergency dispatch system, whereas the marketer would probably be content if 80% were correct So quality relates to the purpose for which the product is intended to be used - the essence of the 'fitness for use' concept

    28. Uncertainty Reduction Uncertainty is reduced by acquiring more information and/or improving the quality of the information available however, it needs only to be reduced to a level tolerable to the decision maker Methods for reducing uncertainty include defining and standardizing technical procedures improving education and training collecting more or different data increasing the spatial/temporal data resolution field checking of observations better data processing methods using better models and improving procedures for model calibration

More Related