740 likes | 901 Views
地理信息系统工程 GIS Engineering. Wuhan University Guo Qing Sheng. 第一部分 GIS 数据工程 DATA SOURCE AND ERROR WUHAN UNIVERSITY GUO QING SHENG. Major GIS Data Sources. Maps Drawings (sketch or engineering) Aerial (or other) Photographs Satellite Imagery CAD data bases
E N D
地理信息系统工程GIS Engineering Wuhan University Guo Qing Sheng
第一部分GIS数据工程DATA SOURCE AND ERRORWUHAN UNIVERSITYGUO QING SHENG
Major GIS Data Sources • Maps • Drawings (sketch or engineering) • Aerial (or other) Photographs • Satellite Imagery • CAD data bases • Government & commercial spatial (GIS) data bases • Government & commercial attribute data bases • Paper records and documents
HOW TO GET DATA • spatial component can be obtained by • remote sensing • photogrammetry • survey • attribute component can be obtained by • remote sensing/photogrametry • interviews • field visit • retype from maps, plans or hardcopy files • copied from existing digital data
DATA SOURCES • available data • digital • map and plan • paper files • low cost • acquisition • remote sensing • photogrammetry • field survey • high cost
AVAILABLE DIGITAL DATA • original format sometimes need to be changed into targeted format • data maybe built for different purposes • quality of data not known
Background • Making / buying high-quality data digital is expensive. • Digitizing (tracing each line electronically) from paper maps • Digitizingon-screen • Using stereoplotters with aerial photographs • Interpreting and ground truth-ing (field plots) satellite data • .
Background • Therefore, it is important to use what others have made whenever possible. • Fortunately, today many agencies are making free / low cost digital data available on the web. • However, one must always check the quality of the data
DATA ENTRY • involves 75% of total implementation cost • majority of data entry methods requires lot of time • data sharing enables lower data costs • existing data
网络查找 http://www.gis.state.ga.us/ Ortho-photograph - scanned aerial photo Digital Elevation Model - raster Population Data - vector Roads and Streams - vector Digital Raster Graphics (DRG) - scanned USGS quadrangle maps Point Data - vector
element of data quality element of data quality element of data quality documents quality of data ArcInfo/ArcView export file 网络查找
Glascock County DEM raster format 网络查找到的DEM数据
Coarser data than elevation data. Date for hydrography is important. Old data will not include recent dams and other Corps modifications. vector 网络查找水系
Glascock Hydrography vector format 网络查找的结果
courser than DEM but finer than hydrography Here the publication data is deceiving. The actual date of the data is 1993. vector
Glascock Roads vector format
Other sources of digital data Federal Geographic Data Clearinghouse
National Spatial Data Infrastructure “Framework”: focus on seven themes of commonly used digital geographic data • Geodetic control • Digital orthoimagery • Elevation data • Transportation • Hydrography • Governmental Units • Cadastral (reference system and public parcels) plus standardized metadata (data describing data) for each
Vector/Raster Data Production • Digital Elevation Model (DEM) data and new (2000) National Elevation Dataset (NED) • Digital Line Graph Data (DLG) data • 数字正射影像图(Digital Orthophoto Map,缩写DOM) and Digital Raster Graphs (DRG) raster data
spatial data base of the world.; 1st released cerca 1992 1:1 million target mapping scale US DoD project in coop. with Canada, Australia, and UK 1.7GB of data on 4 CD-ROMs (North America, Europe/Northern Asia, South America/Africa/Antarctica, SouthernAsia/Australia). $200 cost derived from DMA's 1:1 million scale Operational Navigational Chart (ONC) base maps in Vector Product Format (VPF), but also available in most GIS vendor formats, and ASCII The VPFVIEW 1.1 freeware for DOS and SUN OS available to view VPF World Geodetic System 84 datum Airports, boundaries, coastal, contours, elevation, geographic names, international boundaries, land cover, ports, railroads, roads, surface and manmade features, topography, transmission lines, waterway 1,000 ft contours with 250ft supplements 17 layers with 31 feature classes * Aeronautical Information * Cultural * Landmarks * Data Quality * Drainage * Supplemental Drainage * Utilities * Vegetation * Supplemental Hypsography * Land Cover * Ocean Features * Physiography * Political * Populated Places * Railroads * Roads * Transportation Structures worldwide index with 100,000 place name Digital Chart of the World
Maps and Drawings digitizing, or scanning than raster to vector conversion Aerial Photographs photogrammetry/photo interpretation to extract features digitizing or scanning to convert to digital rectification and DTM (digital terrain model) to create digital orthos Satellite Imagery rectification and DTM to create digital orthos (if desired) CAD Data Bases translator software (pre-existing or custom-written) needed to convert to required GIS format GIS Data Bases conversion between proprietary standards (ARC/INFO, Intergraph, AutoCAD, etc.) Spatial Data Transfer Standard Attribute Databases geocoding if micro data conversion between geographic units(e.g. zip codes and census tracts) conversion between different databases Records and Documents OCR (optical character recognition) scanning keyboarding then, same as attribute data bases Pre-processing and Conversion: almost invariably required!
SPATIAL COMPONENT FROM MAPS AND PLANS • need to be changed into digital format • scanning • digitizing • keyboard entry • coordinates • field survey data • the quality of data is known and controlled. quality of data depends heavily on maps and plans key-in coordinates or survey data produces high quality data
Scanning Keyboard entry Digitizing Producing Digital Data
Attribute #1 Attribute #2 Attribute #3 . . Attribute #n Attribute Component
Cost Quality Data Quality
Spatial Data Quality Error and Uncertainty
Quality • What is quality? • Quality is commonly used to indicate the superiority of a manufactured good or to indicate a high degree of craftsmanship or artistry. We might define it as the degree of excellence in a product, service or performance. • In manufacturing, quality is a desirable goal achieved through management and control of the production process (statistical quality control). • Many of the same issues apply to the quality of databases, since a database is the result of a production process, and the reliability of the process imparts value and utility to the database.
Data Quality • Why is there concern for data quality? • Increased data production by the private sector, where there are no required quality standards. In contrast, production of data by national mapping agencies has long been required to conform to national accuracy standards (i.e., mandated quality control). • Increased use of GIS for decision support, such that the implications of using low-quality data are becoming more widespread (including the possibility of litigation if minimum standards of quality are not attained). • Increased reliance on secondary data sources, due to the growth of the Internet, data translators and data transfer standards. Thus, poor-quality data is ever easier to get.
DATA QUALITY • misconception that data from GIS is of higher quality • GIS uses the latest technology • quality of GIS information depends on quality of data • ‘garbage in garbage out’ • conventional method, users decide for their own • GIS? 《牛津高阶现代英语词典》第四版(1989),此语汇的英文注释是:“(in computing) if you input wrong data, the output will also be wrong”;在以它为蓝本的《牛津高阶英汉双解词典》(第四版,1994)中的汉译是:“(计算机运算中)错进,错出(若输入错误数据,则输出亦为错误数据)”。
Spatial Data Quality • Dimensions of geographical data quality • geographical spatial • Lecture 04: g(x, y, z) = (t, a1, .. an) • Matrix of geographical dimensions and quality components
SPATIAL ACCURACY • Precision - indicates how closely several positions fall in relation to each other • Accuracy - is a measure of the closeness of one or more positions to a position that's known and defined in terms of an absolute reference system.
Accuracy • Accuracy is the inverse of error • Many people equate accuracy with quality but in fact accuracy is just one component of quality. • Definition of accuracy is based on the entity-attribute-value model: • Entities = real-world phenomena • Attribute = relevant property • Values = Quantitative/qualitative measurements
Accuracy • An error is a discrepancy (1.矛盾,不同 2.差异;不同;不一致 )between the encoded and actual value of a particular attribute for a given entity. “Actual value” implies the existence of an objective, observable reality. However, reality may be: • Unobservable (e.g., historical data) • Impractical to observe (e.g., too costly) • Perceived rather than real (e.g., subjective entities such as “neighbourhoods")
Accuracy • We do not need an objective reality in order to assess accuracy, since all geographical data are collected with the aid of a model that specifies — implicitly or explicitly — the required level of abstraction and generalisation. • This is the database “specification”. • The specification serves as the standard against which accuracy is assessed. Thus the “actual” value is the value we would expect based on the specification. • Accuracy is always a relative measure, since it is always measured relative to the specification.
ERROR SOURCES (I) • data acquisition • device/instrument errors • data entry errors • image interpretation error • data conversion • instrument inaccuracies • device/instrument operator • manuscript used
ERROR SOURCES (II) • data storage • digital representation limits • disk storage limits • used by huge raster formats • data processing • rounding off error • digital representation • error propagation law • information derived by mathematical operations no more accurate than original information
Quality of aerial photo data for spatial analyses. • Georeference • Points in photo are assigned real-world coordinates. • Coordinates: used to read maps and “tell” us where we are • so many feet in x-direction • so many feet in y-direction • Photo-distortion • scale at center of photo does not = scale at edges • photo picks up some vertical distortion due to terrain. • To make photo distance = map distances, distortion must be removed.
Quality of satellite data for spatial analyses. • Georeference quality • Classification quality • How many land cover types? • Agriculture or carrots, corn, and cotton? • How many ground-truthed points per type? • How well are ground-truthed points distributed throughout area?
Spatial Accuracy • Spatial accuracy is the accuracy of the spatial component of the database. The metrics used depend on the dimensionality of the entities under consideration. • For points, accuracy is defined in terms of the distance between the encoded location and “actual” location. • Error can be defined in various dimensions: x, y, z, horizontal, vertical, total. • Metrics of error are extensions of classical statistical measures (mean error, RMSE or root mean squared error, inference tests, confidence limits, etc.).
Spatial Accuracy • For lines and areas, the situation is more complex. This is because error is a mixture of positional error (error in locating well-defined points along the line) and generalisation error (error in the points selected to represent the line). • The epsilon band is usually used to define a zone of uncertainty around the encoded line, within which “actual” line exists with some probability. • However, there is little agreement (and little empirical work) on the shape of the band, both planimetrically and in cross-section.
Temporal Accuracy • Temporal accuracy is the agreement between the encoded and “actual” temporal coordinates for an entity. • Temporal coordinates are often only implicit in geographical data, e.g., a time stamp indicating that the entity was valid at some time. Often this is applied to the entire database (e.g., a map dated “1995”).
Temporal Accuracy • Temporal accuracy is not the same as “database time”, which is the time the information was entered into the database. • Temporal accuracy is not the same as “currentness” (or up-to-dateness) which is actually an assessment of how well the database specification meets the needs of a particular application. • A database can be temporal accurate but still out of date; historical applications depend on such data.
Thematic Accuracy • Thematic accuracy is the accuracy of the attribute values encoded in a database. • The metrics used here depend on the measurement scale of the data: • Quantitative data (e.g., precipitation) can be treated like a z-coordinate (elevation) and assessed using metrics normally used for vertical error (such as the RMSE). • Qualitative data (e.g., land use/land cover) is normally assessed using a cross-tabulation of encoded and “actual” classes at sample of locations. This produces a classification error matrix
Thematic Accuracy • Element in row i, column j of the matrix is the number of sample locations assigned to class i but actually belonging to class j. • The sum of the main diagonal divided by the number of samples is a simple measure of overall accuracy. • An error of omission means a sample that has been omitted from its actual class. An error of commission means an error that is included in the wrong class. Ever error of omission is also an error of commission.
Resolution (Precision) • Resolution refers to the amount of detail that can be discerned in space, time or theme. • Resolution is always finite • Resolution is an aspect of the database specification • High resolution is not always better • Resolution is linked with accuracy, since the level of resolution affects the database specification against which accuracy is assessed. • Two databases with the same overall accuracy levels but different levels of resolution do not have the same quality; the database with the lower resolution has less demanding accuracy requirements.
Spatial Resolution • Spatial resolution is well-defined in the context of raster data were it refers to the linear dimension of a cell