340 likes | 393 Views
Data Quality. It’s not the things you don’t know that matter, it’s the things you know that aren’t so. Will Rogers Famous Okie GI specialist. GiGo: garbage in, garbage out ‘Cos it’s in the computer, don’t mean it’s right.
E N D
Data Quality It’s not the things you don’t know that matter, it’s the things you know that aren’t so. Will Rogers Famous Okie GI specialist GiGo: garbage in, garbage out ‘Cos it’s in the computer, don’t mean it’s right “But there are also unknown unknowns: the ones we don't knowwe don't know.” Donald Rumsfeld
Murphy’s Laws of Mapmaking Cardinal Postulates • area desired by the user has not yet been mapped. • if mapped, area straddles zone boundaries--or at least map sheets • if on one sheet, sheet is scheduled for update next year; last update was 1901 Corollary for GIS • area desired by user is still in paper-map form • if in GIS, recorded with X-Y coordinates and straddles zone boundaries--or at least map tiles • if one tile, projection unknown and no information on date of creation and/or last update Conclusion: GIS is not a panacea!
Data Quality:How good is your data? • Scale • ratio of distance on a map to the equivalent distance on the earth's surface • Primarily an output issue; at what scale do I wish to display? • Precision or Resolution • the exactness of measurement or description • Determined by input; can output at lower (but not higher) resolution • Accuracy • the degree of correspondence between data and the real world • Fundamentally controlled by the quality of the input • Lineage • The original sources for the data and the processing steps it has undergone • Currency • the degree to which data represents the world at the present moment in time • Documentation or Metadata • data about data: recording all of the above • Standards • Common or “agreed-to” ways of doing things • Data built to standards is more valuable since it’s more easily shareable
1 0 2 Miles Scale • ratio of distance on a map, to the equivalent distance on the earth's surface. • Large scale -->large detail, small area covered (1”=200’ or 1:2,400) • Small scale -->small detail, large area (1:250,000) • A given object (e.g. land parcel) appears larger on a large scale map • scale can never be constant everywhere on a map ‘cos of map projection • problem is worst for small scale maps & certain projections (e.g. mercator) • can be true from a single point to everywhere • can be true along a line , or a set of lines • on large scale maps, adjustments often made to achieve ‘close to true’ scale everywhere (e.g State Plane and UTM systems) • scale representation • Verbal: (good for interpretation.) 0ne inch each equals one statute mile • representative fraction (RF) 1: 63,360(good for measurement)(smaller fraction=smaller scale: 1:2,000,000 smaller than 1:2,000) • scale bar:(good if enlarged/reduced) use them all on a map!
Common Scales 1:200 (1”=16.8ft) 1:2,000 (1”=56 yards; 1cm=20m) 1:20,000 (5cm=1km) 1:24,000 (1”=2,000ft) 1:25,000 (1cm=.5km) 1:50,000 (2cm=1km) 1:62,500 (1.6cm=1km; 1”=.986mi) 1:63,360 (1”=1mile; 1cm=.634km) 1:100,000 (1”=1.58mi; 1cm=1km) 1:500,000 (1”=7.9mi; 1cm=5km) 1:1,000,000(1”=15.8mi; 1cm=10km) 1:7,500,000(1”=118mi); 1cm=750km) Large versus Small large: above 1:12,500 medium: 1:13,000 - 1:126,720 small: 1:130,000 - 1:1,000,000 very small: below 1:1,000,000 ( really, relative to what’s available for a given area; Maling 1989) Map sheet examples: 1:24,000: 7.5 minute USGS Quads (17 by 22 inches; 6 by 8 miles) 1:7,500,000 US wall map (26 by 16 inches) 1:20,000,000: US 8.5” X 11” Scale Examples
Scale, Resolution & Accuracyin GIS Systems • On paper maps, scale is hard to change, thus it generally determines resolution and accuracy--and consistent decisions are made for these. • A GIS is scale independent since output can be produced at any scale, irrespective of the characteristics of the input data— at least in theory • in practice, an implicit range of scales or maximum scale for anticipated output should be chosen and used to determine: • what features to show • manholes only on large scale maps • how features will be represented • manhole a polygon at 1:50; cities a point at 1:1,000,000 • appropriate levels for accuracy and precision • Larger scale generally requires greater resolution • Larger scale necessitates a higher level of accuracy • GIS also helps with the the generalization problem implicit in paper maps • A road drawn with 0.5 mm wide line (the smallest for decent visibility) • At 1:24,000 implies the road is 12 meters (36 feet) wide • At 1:250,000 implies the road is 125 meters (375 feet) wide • At least in a GIS you can store the true road width, but be careful with plots!
3.2ft 1.6ft 3.2ft Precision or Resolution it’s not the same as scale or accuracy! Precision: the exactness of measurement or description • the “size” of the “smallest” feature which can be displayed, recognized, or described • Can apply to space,time (e.g. daily versus annual), or attribute(douglas fir v. conifer) • for raster data, it is the size of the pixel (resolution) • e.g. for NTGISC digital orthos is 1.6ft (half meter) • raster data can be resampled by combining adjacent cells; this decreases resolution but saves storage • eg 1.6 ft to 3.2 ft (1/4 storage); to 6.4 ft (1/16 storage) • resolution and scale • generally, increasing to larger scale allows features to be observed better and requires higher resolution • but, because of the human eye’s ability to recognize patterns, features in a lower resolution data set can sometimes be observed better by decreasing the scale (6.4 ft resolution shown at 1:400 rather than 1:200) • resolution and positional accuracy • you can see a feature (resolution), but it may not be in the right place (accuracy) • higher accuracy generally costs much more to obtain than higher resolution • accuracy cannot be greater (but may be much less) than resolution (e.g. if pixel size is one meter, then best accuracy possible is one meter)
Accuracy: rests on at least four legs,not one! Positional Accuracy (sometimes called Quantitative accuracy) Spatial • horizontal accuracy: distance from true location • vertical accuracy: difference from true height Temporal • Difference from actual time and/or date Attribute Accuracy or Consistency-- the validity concept in experimental design/stat. inf. • a feature is what the GIS/map purports it to be • a railroad is a railroad, and not a road • A soil sample agrees with the type mapped Completeness--the reliability concept from experimental design/stat. inf. • Are all instances of a feature the GIS/map claims to include, in fact, there? • Partially a function of the criteria for including features: when does a road become a track? • Simply put, how much data is missing? Logical Consistency: The presence of contradictory relationships in the database • Non-Spatial • Some crimes recorded at place of occurrence, others at place where report taken • Data for one country is for 2000, for another its for 2001 • Annual data series not taken on same day/month etc. (sometimes called lineage error) • Data uses different source or estimation technique for different years (again, lineage) • Spatial • Overshoots and gaps in road networks or parcel polygons
Sources Inherent instability of the phenomena itself E.g. Random variation of most phenomena (e.g. leaf size) Measurement E.g. surveyor or instrument error Model used to represent data E.g. choice of spheroid, or classification systems Data encoding and entry E.g. keying or digitizing errors Data processing E.g. single versus double precision; algorithms used Propagation or cascading from one data set to another E.g. using inaccurate layer as source for another layer Example for Positional Accuracy choice of spheroid and datum choice of map projection and its parameters accuracy of measured locations (surveying) of features on earth media stability (stretching ,folding, wrinkling of maps, photos) human drafting, digitizing or interpretation error resolution &/or accuracy of drafting/digitizing equipment Thinnest visible line: 0.1-0.2 millimeters At scale of 1:20,000 = 6.5 - 12.8 feet (20,000 x 0.2 = 4,000mm = 4m = 12.8 feet) registration accuracy of tics machine precision: coordinate rounding error in storage and manipulation other unknown Sources of ErrorError is the inverse of accuracy. It is a discrepancy between the coded and actual values.
e12 + e22 + e32 +...+ en2 n-1 rmse = where ei is the distance (horizontally or vertically )between the tue location of point i on the ground, and its location represented in the GIS. Measurement of Positional Accuracy • usually measured by root mean square error: the square root of the average squared errors • Usually expressed as a probability that no more than P% of points will be further than S distance from their true location. • Loosely we say that the rmse tells us how far recorded points in the GIS are from their true location on the ground, on average. • More correctly, based on the normal distribution of errors, 68% of points will be rmse distance or less from their true location, 95% will be no more than twice this distance, providing the errors are random and not systematic (i.e. the mean of the errors is zero) • e.g. for NTGISC digital orthos RMSE is 3.2 feet (one meter) for USGS Digital Ortho Quads RMSE spec. is approx. 33 feet or 10 meters (but inreality much better) -- with GPS, height is 2 or 3 times less accurate in practice at high precisionthan horizontal (officially the spec is 1.5, but data collection errors affect vertical the most)
1/50=.02” Smaller scale 1:20,000 1/30=.033” Larger scale National Map Accuracy Standards: 1941/47 • established in 1941 by the US Bureau of the Budget (now OMB) for use with US Geological Survey maps (Maling, 1989, p. 146) • horizontal accuracy: not more than 10% of tested, ‘well defined’ points shall be more than the following distances from their true location: • 1:62,500: 1/50th of an inch (.02”) • 1:24,000: 1/40th of an inch (amended to 1/50=.02” in 1947) • 1:12,000: 1/30 of an inch (.033”) • Thus, on maps with a scale of 1:63,360 (1”=1 mile) 90% of points should be within 105.6 feet [(63360 X .02)/12)] of their true location. • on USGS quads with a scale of 1:24,000 (1”=2,000ft) 90% of points should be within 40 feet [(24,000 X .02)/12] of their true location. • on a map with a scale of 1:12,000 (1”=1,000ft), 90% of points should be within 33 feet (1,000 X .033), approx. 10 meters • gives rise to the loose, but often used, statement that the “NMAS is 10 meters” • Inadequate for the computer age • how many points? how select? • how determine their ‘true’ location • what about attribute completeness? • Unfortunately, the “new standard” doesn’t address all these issues either
National Standard for Spatial Data Accuracy (NSSDA)1998 Geospatial Positioning Accuracy Standard (FGDC-STD-007) Part 3, National Standard for Spatial Data Accuracy FGDC-STD-007.3-1998 • “replacement” for National Map Accuracy Standard of 1941/47 • specifies a statistic and testing methodology for positional (horizontal and vertical) accuracy of maps and digital data • no single threshold metric to achieve (as with old Standard), but users encouraged to establish thresholds for specific applications • accuracy reported in ground units (not map units as in 1941 standard [1/30th inch]) • testing method compares data setpoint coordinate values with coordinate values from a higher accuracy source for readily visible or recoverable ground points • altho. uses points, principles apply to all geospatial data including point, vector and raster objects • other standards for data content will adopt NSSDA for particular spatial objects • copies of the standard available at: http://www.fgdc.gov • Accuracy Standard has 7 parts, of which parts 4-7 apply to specific data types
According to chart In reality GPS and Positional Accuracy • Global Positioning System satellite positioning with WAAS (wide area augmentation system) adjustment gives positional accuracy within about 3 meters (10ft). • This is more accurate than most printed maps and nautical charts! • It is also more accurate than most digital maps and charts since these often derive from paper maps and surveys conducted prior to GPS • Your integrated GPS/digital chart can show you nicely heading down the center of a channel, but positional inaccuracy in the chart can leave you grounded!
Summary:Resolution, Scale, Accuracy & Storage:illustrating the relationship Largest (maximum) scale for given pixel size. Storage is for USGS 7.5 quad. area (in Texas, USGS quad is about 7 mi x 8.5 mi=60 sq. miles--16 quads for Dallas County) Source: GPS Technology Corporation
Examples of Accuracy Go to quality_graphics.ppt
Lineage • identifies the original sources from which the data was derived • details the processing steps through which the data has gone to reach its current form • Both impact its accuracy • Both should be in the metadata, and are required by the Content Standard for Metadata (see below) • Michael Goodchild ( the guru of GIS) advocates: • Measurement-based GIS, in which how data collected and how measurements made are a part of the record (as in surveying) • Coordinate-based GIS, is the current approach, and it tracks none of this. (seeShi, Fisher and Goodchild Spatial Data Quality London: Taylor and Frances, 2002)
Currency: Is my data “up-to-date”? • data is always relative to a specific point in time, which must be documented. • there are important applications for historical data (e.g. analyzing trends), so don’t necessarily trash old data • “current” data requires a specific plan for on-going maintenance • may be continuous, or at pre-defined points in time. • otherwise, data becomes outdated very quickly • currency is not really an independent quality dimension; it is simply a factor contributing to lack of accuracy regarding • consistency: some GIS features do not match those in the real world today • completeness: some real world features are missing from the GIS database Many organizations spend substantial amounts acquiring a data set without giving any thought to how it will be maintained.
Standards: common “agreed-to” ways of doing things • May exist for: • Data itself [including process (the way it’s produced) and product (the outcome)] • Utilities Data Content Standard, FGDC-STD-010-2000 • Accuracy of data • Geospatial Positioning Accuracy Standard, Part 3, National Standard for Spatial Data Accuracy, FGDC-STD-007.3-1998 • Documentation about the data (metadata) • Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD-001-1998 • Transfer of data and its documentation • Spatial Data Transfer Standard (SDTS), FGDC-STD-002 • For symbology and presentation • Digital Geologic Map Symbolization • May address: • Content (what is recorded) • Format (how it’s recorded: file format, .tif, shapefile, etc) • May be a product of: • An organization’s internal actions [private or organization standards] • An external government body (Federal Geographic Data Committee) or third sector body (Open GIS Consortium) [public or de jure standards] • Laissez-faire market-place-forces leading to one dominant approach e.g. “Wintel standard” [industry or de facto standards] http://www.fgdc.gov/standards/standards.html
Who Sets Public Standards ? • Federal Geographic Data Committee • Sets standards for geospatial data which all federal agencies are required to follow • Has representatives from most federal agencies • National Institute for Standards and Technology (NIST) sets federal gov. standards for other things (e.g. IT in general) • national standards bodies • American National Standards Institute (ANSI) • has the US’s single vote at ISO • United States InterNational Committee on Information Technology Standards (INCITS) handles IT standards for ANSI • Several FGDC standards been submitted for approval • Most countries in the world have their equivalent to ANSI • international standards bodies • ISO (International Organization for Standardization) • other assorted vendor groups, professional associations, trade associations, and consortia • Open GIS Consortium (OGC) is the main player in GIS
The Process for Setting de jure standards! Source: URISA News Issue 197, Sept/Oct. 2003 Go to the following web site for excellent overview of standard making: process http://www.fgdc.gov/publications/documents/standards/geospatial_standards_part1.html
Adopting Standards: What you should do • Data quality achieved by adoption and use of standards: Do it! • Common ways of doing things essential for using & sharing data internally and externally • only federal agencies required to use FGDC standards, its optional for any others (e.g. state, local) • power of feds often results in adoption by everybody, although there are some noted failures (e.g.the OSI, GOSIP, & POSIX standards in computing failed and were withdrawn) • FGDC standards provide excellent starting point for local standards, and should be adopted unless there are compelling reasons otherwise • Standards for metadata (“documenting your data”) are the most important and should be first priority. • Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD-001-1998 • If not this standard for metadata, adopt some standard!
Content Standards for Digital Geospatial MetadataWhat and Why? Metadata — describes the content, quality, format, source and other characteristics of data. Major uses of metadata: • organize and maintain an organization's investment in data (help internal use of the data) • provide information to data catalogs and clearinghouses (help others find and evaluate the data) • provide information to aid data transfer and subsequent use by others (help others use the data)
Main Sections of the US FederalContent Standard for Digital Geospatial Metadata Identification Title? Area covered? Themes? Currency? Restrictions? Data Quality (5 aspects) Positional & Attribute Accuracy? Completeness? Logical Consistency? Lineage? Spatial Data Organization Indirect? Vector? Raster? Type of elements? Number? Spatial Reference Projection? Grid system? Datum? Coordinate system? Entity and Attribute Information Features? Attributes? Attribute values? Distribution Distributor? Formats? Media? Online? Price? Metadata Reference Metadata currency? Responsible party? For more info, go to: http://www.fgdc.gov/metadata/contstan.html By law (Executive Order 12906, 1994),all federal agencies must document their data according to: Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD-001-1998
+ + tic marks: Points of positional reference used to relate map to ground or other map + + Minimum Documentation Requirements If GIS data in lat/long, must know datum. If GIS data in XY, must know datum and projection info) • geodetic datum name (e.g NAD27)--which implies: • ellipsoid/spheroid name (earth model) e.g. Clark 1866 • point of origin (ties ellipsoid to earth) e.g Meades Ranch • required for all GIS data bases and maps • projection name and its parameters and its measurement units (see terrestrial lecture for exact details) • Required for all maps since 2-D by nature • Required for GIS if data is in X-Y projected form • Source information • accuracy standard(s) to which built • author/publisher/creator name and/or data source • date(s) of data collection/update, and of map/gis creation • Cartographers demand all maps have • north arrow • map scale • graticule indication • at least four latitude/longitude tic marks, with values in degrees • at least four X-Y tic marks, with values and units measurement (feet, meters, etc.)
Texas Standardshttp://www.dir.state.tx.us/tgic/pubs/pubs.htm • Standards for digital spatial data (raster and vector) for State agencies in Texas were established in 1992 • http://www.dir.state.tx.us/tgic/pubs/gis-standards.htm • Currently (2004), being reviewed by the Texas Geographic Information Council (TGIC) for possible update • Apply to map scales of 1:24,000 and smaller (e.g., 1:100,000; 1:250,000). • Cover variety of issues including data layers, datum, projections, accuracy, metadata, etc.. • Two major planning reports on GIS in state gov. in Texas are: • Digital Texas: 2002 Biennial Report on Geographic Information Systems Technology • http://www.dir.state.tx.us/tgic/pubs/gift99-small.pdf • Geographic Information Framework for Texas (1999) • http://www.dir.state.tx.us/tgic/pubs/digtex-lowres.pdf
Importance of Standards • Great Baltimore Fire of 1904 - fire engines from different regions responded only to be found useless since they had different hose coupling sizes that did not fit Baltimore hydrants - fire burned over 30 hours, resulted in destruction of 1526 building covering 17 city blocks. • Fire 1923 - Fall River, MA saved when over 20 neighboring fire department responded to a town fire since they had standardized on hydrants and hose couplings sizes. • 9/11: Response in NY and DC severely hampered by • incompatibilities between GIS data sets, and lack of data • Also, incompatibilities between communications systems • The most important standard? • Railroad track gauge - adopted by US, UK, Canada, and much of Europe. • South America still hampered by differing railroad gauges between countries.
The Best Time to Adopt a Standard? Now? Now? Before!
Appendix FGDC Standards (status as of March 2004) For latest, go to: http://www.fgdc.gov/standards/standards.html
FGDC: Metadata Standards Metadata: • Content Standard for Digital Geospatial Metadata (version 2.0) FGDC-STD-001-1998 • Content Standard for Digital Geospatial Metadata, Part 1: Biological Data Profile FGDC-STD-001.1-1999 • Metadata Profile for Shoreline Data (FGDC-STD-001.2-2001) • Content Standard for Digital Geospatial Metadata: extension for remote sensing data (FGDC-STD-0012-2002) • Encoding Standard for Geospatial Metadata (Draft) • Metadata Profile for Cultural and Demographic Data (dropped) Current thrust is to integrate FGDC Metadata standards (and other FGDC standards eventually) into International Standards Organization (ISO) standards.
FGDC: Data Accuracy Standard Geospatial Positioning Accuracy Standard (FGDC-STD-007) Part 1, Reporting Methodology FGDC-STD-007.1-1998 Part 2, Geodetic Control Networks FGDC-STD-007.2-1998 Part 3, National Standard for Spatial Data Accuracy FGDC-STD-007.3-1998 Part 4: Architecture, Engineering Construction, and Facilities Management (FGDC-STD-007.4-2002), Part 5: Standard for Hydrographic Surveys and Nautical Charts (Review) • An umbrella incorporating several accuracy standards. • Part 3 is the general standard. • It essentially updates the National Map Accuracy Standard of 1941/47
FGDC: Data Content Standards • Facility ID Data Standard, (Review) • Address Content Standard (Review) • US National Grid (FGDC-STD-0011-2001) • Earth Cover Classification System, (draft) • Geologic Data Model, (Draft) • Governmental Unit Boundary Data Content Standard, (Draft) • Biological Nomenclature and Taxonomy Data Standard (draft) • National Hydrography Framework Geospatial Data Content Standard (proposal) • Environmental Hazards Geospatial Data Content Standard, (dropped) • NSDI Framework Data layers (under Review—see next slide) • Cadastral Data Content Standard FGDC-STD-003 • Classification of Wetlands and Deep Water Habitats FGDC-STD-004 • Vegetation Classification Standard FGDC-STD-005 • Soils Geographic Data Standard, FGDC-STD-006 • Content Standard for Digital Orthoimagery, (FGDC-STD-008-1999) • Content Standard for Remote Sensing Swath Data, (FGDC-STD-009-1999) • Utilities Data Content Standard, (FGDC-STD-010-2000) • NSDI Framework Transportation Identification Standard, (Review) • Hydrographic Data Content Standard for Coastal and Inland Waterways, (Review) • Content Standard for Framework Land Elevation Data, (Review)
FGDC: Framework Data Standards • geodetic control, • elevation, • Orthoimagery • Hydrography (water) • Transportation • Cadastral (landownership) • governmental unit boundaries • establish data content requirements for the seven layers of geospatial data that comprise the National Spatial Data Infrastructure (NSDI), the base layers needed for any geographic area • Goals are to • Facilitate and promote exchange of framework layers between producers, consumers, and vendors thru a common content and way of describing that content • Lower the cost of data for everyone • For each layer, specifies an integrated application schema in Unified Modeling Language (UML) including feature types, attribute types, attribute domain, feature relationships, spatial representation, data organization, and metadata • no standard specified for data format, but an appendix describes a possible implementation using the Geography Markup Language (GML) Version 3.0, developed through the Open GIS Consortium, Inc. (OGC).
FGDC: Data Transfer Standards Spatial Data Transfer Standard (SDTS) FGDC-STD-002 SDTS, Part 1 Logical Specification (FIPSPUB 173-1, July 1994) SDTS, Part 2 Spatial Features (FIPSPUB 173-1, July 1994) SDTS, Part 3 ISO 8211 Encoding (FIPSPUB 173-1, July 1994) SDTS, Part 4 Topological Vector Encoding (FIPSPUB 173-1, July 1994) SDTS, Part 5 Raster Profile and Extensions (FGDC-STD-002.5, 2000) SDTS, Part 6: Point Profile, FGDC-STD-002.6, 2000 SDTS Part 7: Computer-Aided Design and Drafting (CADD) Profile (FGDC-STD-002.7, 2000) • One of the first of the FGDC standards (along with metadata). • Intended to facilitate transfers between different GIS systems. • Competitive pressures plus internal weaknesses hindered adoption.
FGDC: Data Symbology and Presentation Standards • Digital Geologic Map Symbolization, (Review)