150 likes | 286 Views
Geog 458: Map Sources and Errors. Uncertainty January 23, 2006. Outlines. Defining uncertainty How to calculate uncertainty? Nominal case: Confusion matrix Interval/ratio case: RMSE How to validate uncertainty? Internal validation: MAUP External validation: Conflation.
E N D
Geog 458:Map Sources and Errors Uncertainty January 23, 2006
Outlines • Defining uncertainty • How to calculate uncertainty? • Nominal case: Confusion matrix • Interval/ratio case: RMSE • How to validate uncertainty? • Internal validation: MAUP • External validation: Conflation
1. Defining uncertainty • Definition of uncertainty • Discrepancy between reality and its representation • Different kinds of uncertainty • Vagueness: representation is not well accommodated into the essence of reality (e.g. representing cities as a point layer, soil as crisp boundary) better human conceptualization needed • Ambiguity: representation is not unilaterally agreed by users (e.g. placenames, occupation classification, indicator of environmental health) standardization needed • Accuracy vs. precision • Accuracy: difference between true values and those in DB • Precision: amount of detail present in data
Questions • Your diagnostics among {uncertainty, precision, positional accuracy, attribute accuracy, vagueness, ambiguity} and what are your prescriptions? • Longitude values in decimal degree are stored as an integer • Contour lines derived from DEM is not well lined up with DRG • The map indicates this road is bidirectional, but it turns out to be one-way • Implementing intelligent geocoding system based on preposition in English (e.g. across, at, over) for international users • Is the boundary of Mt. Everest well delineated? Is this polygon boundary a good representation of Mt. Everest? • Which is broadest? How would you communicate these errors in your data quality report?
2. Calculating accuracy • Nominal case • Confusion matrix (a.k.a. misclassification matrix) • Interval/Ratio case • Root Mean Square Error (RMSE) • Confusion matrix is widely used to report on attribute accuracy when measured at a nominal scale • RMSE is widely used to report on position accuracy when measured at a numeric scale (e.g. x, y coordinates are metric)
Confusion Matrix • Table 6.2 (p. 138): evaluating classification of land parcel there are five land use code A to E • Rows and columns in misclassification matrix • Row corresponds to the class as recorded in the database • Column corresponds to the class as recorded in the field • Correctly classified vs. incorrectly classified • Diagonal entries represent agreement between database and field • Off-diagonal entries represent disagreement between database and field • So how accurate would you say about this data? • Since 206 (sum of diagonal entries) is correctly classified out of 304, it would be 206/304 = 68.6%
Confusion matrix: exercise • Let’s say you decide to write a test report on attribute accuracy of land use map • 100 reference points are selected to represent three classes, 49 points from natural, 28 points from agricultural, and 23 points from urban land use in your data • Field checks resulted in 41 points confirmed to be natural, 21 points confirmed to be agricultural, and 19 points confirmed to be urban. • What is overall accuracy of your data?
Root Mean Square Error • RMSE = • where ci is observed value and ai is true value • RMSE is the square root of sum of squared difference between observed value (ci) and its corresponding true value (ai) • Indicates how much observed value is deviated from true values • In the case of positional accuracy, ai will be derived from data with source in higher accuracy
RMSE: exercise • Let’s say you decide to write a test report on positional accuracy of NHPN data • You obtain data of sources with a higher positional accuracy such as geodetic points • 7 points (intersections) are selected to be compared to 7 corresponding control points • Distances for 7 pairs are calculated as follows • What is RMSE?
3. Validating accuracy • Internal validation • Examines likely impacts of uncertainty upon operation results within GIS • What would be effects of different data aggregation schemes on operation results?: MAUP • External validation • Validates accuracy of test data in reference to external data sources • How much is this data set accurate relative to reference data?: Conflation
Modifiable Areal Unit Problem • Quite simply, different aggregations yield different results • From Openshaw • Because sometimes geography does not have a natural unit of analysis • Population, vegetation • Remember census unit is artificial boundary for the purpose of enumeration • Space is used as a sampling scheme • Question of optimal unit of analysis • Urban center boundary for analyzing urban activities • Metropolitan area for analyzing spatial labor market
Conflation • Describes the range of functions that attempt to overcome differences between datasets or merge their contents as with rubber-sheeting • Visual inspection of spatial overlay of TIGER file over GPS measurements • Lab2: working with data of different sources, conflating test data with data of independent source (higher accuracy), visual inspection of positional accuracy, summarizing positional accuracy of test data with RMSE