320 likes | 496 Views
Using uncertainties, analysis and use of discrete entities. Peter Fox GIS for Science ERTH 4750 (98271) Week 9, Tuesday, March 27, 2012. Contents. Using uncertainties Regression Projects. Using uncertainties in regressions.
E N D
Using uncertainties, analysis and use of discrete entities Peter Fox GIS for Science ERTH 4750 (98271) Week 9, Tuesday, March 27, 2012
Contents • Using uncertainties • Regression • Projects
Using uncertainties in regressions • The regressions we've done so far using Excel did not take into account the uncertainties in the measurements. • Uncertainties in the data impact both the regression parameters and their uncertainties. • First, we discuss the general method of curve-fitting and then how we can include the data uncertainties. • Then we show how it is done in Excel.
Define… • N - the number of observed values • K - the number of unknowns (coefficients to be estimated) • O - the matrix containing the known y's (aka the observations); 1 column by N rows • M - the matrix containing the known x's; K columns by N rows • MT - transpose of M (swap rows and columns); N columns by K rows • P - the matrix containing the unknowns; 1 column by K rows
Equations • The linear equations can be written as O = M P. • The goal is to estimate the unknowns in matrix P. • If you have a set of equations where N = K (number of equations = number of unknowns), the solution is P = M-1 O • where the superscript (-1) indicates the matrix inverse.
N>K • When we have more data than unknowns (N>K) and the data are subject to errors, we use the least-squares procedure. • It has the solution: P = (MT M)-1 MT O • The uncertainties in the estimated unknowns are contained in the matrix (MT M)-1, which is the parameter covariance matrix.
In Excel… • Excel can perform all the matrix operations needed so we can do the regression long-hand. • The functions are: • TRANSPOSE(data range) - transpose of a matrix • MMULT(matrix1, matrix2) - multiply two matrices together • MINVERSE(data range) - get inverse of a matrix
Example • We see that the calculations match the results of the Excel trend-line function. • Regression with equal weighting of all data implied. http://escience.rpi.edu/gis/data/regression.xls
Including uncertainties • If we want to include uncertainties, we define 2 new matrices: • C - the covariance matrix containing the uncertainties (variances) in the observations; N columns by N rows • W - the weight matrix which is the inverse of the covariance matrix; N columns by N rows • The least-squares solution becomes: P = (MTW M)-1 MT W O
Adjusted e.g. • The final data point has little influence on the solution when its uncertainty is increased by a factor of 3. • Weighting (equal) also influences the standard errors in the estimated unknowns.
Unequal weighting http://escience.rpi.edu/gis/data/wtd_regression.xls
Discrete entities (GIS) • Entity • Has attributes, possibly derived from other attributes • Has location • Proximity / connectivity – topology • New attributes U = f (A1, A2, …) • The function f can be logical, (Boolean algebra, True or False), arithmetic, statistical, etc.
Boolean operations • Think of each attribute for the entities as a set. The condition of the query ‘select all where A1 = red’ is same as the Boolean ‘A1 = red’ in which you would select True results. • Or you could select ‘not (A1 = red)’ which gets everything except red. • For multiple sets (attributes) we might want the intersection or union of them. • This process can be used to re-classify your map, i.e., cut down on the number of discrete attributes.
Examples • U = (A1 = red ) AND (A2 = blue) gives intersection of sets, both conditions are required to be true • U = (A1 = red ) OR (A2 = blue) gives union of sets, either condition can be true • U = (A1 = red ) XOR (A2 = blue) gives set where A1 = red and A2 <> blue plus the set where A1 <> red and A2 = blue • The OR statement above would return A1 = red and A2 = any color plus the set where A1 = any color and A2 = blue
Soil example • Set A: soil type = ‘Oregon loam’ • Set B: pH >= 7.0 • A and B = all soils of OL with pH >= 7.0 • A or B = all soils of OL and all soil types with pH >= 7.0 • A xor B = all OL with pH < 7.0 and all soils of any type other than OL that have pH >= 7.0 • A not B = all OL with pH < 7.0
Statistical operations to determine similar regions • Given a distribution of data points, we may want to collect them into a finite number of polygons where each polygon contains values within a specified range or with similar statistical distribution. • For example, your company has hired 5 ‘bill collectors’ and their methods of collection range from thumb-breaking to persistent whining. ;-)
Statistical operations to determine similar regions • You’d like to assign the collectors to different sales regions based on the history of compliance and you want to give each collector a similar area to cover. • Since you don’t want your thumb-breaker (e.g., Rocky) to deal with a large number of people who normally pay their bills, you want to assign him or her to the region with a large number of deadbeats and a small number of payers. • So you want the mean rate of non-compliance to be high but the variance to be low.
Or • Or perhaps you are interested in bio-diversity. • You want to divide your field area into a finite number of regions based on the diversity of the flora and see what regions support the largest variety. • But you may want to exclude plants that have only a few representatives in a given sector.
Or • Or given that you already have natural polygons, say counties for example, you may want to group the entities according to similar statistical distributions of the attribute of interest. • For example, for radon levels you could group the counties by similar mean counts or by similar variance in them. • There are many statistical operations you could use, depending on what your particular goal is. GIS for science!
Buffering • Buffering generally involves operations that depend only on distance or proximity between entities. • However, we can derive other attributes and connectivity that depend on distance only. • Such topology can be simple or complex functions of distance (population density) • E.g. the errors in the red star sites and roads from last week
Connectivity • Attributes relating to connectivity are generally in the database. • For example, the time it takes to drive a particular segment of road should be an attribute. • From such data, the time it takes to get from point A to point B can be calculated.
Using connectivity • We can make inferences about connectivity from data not specifically related to connectivity. • For example we can assume that travel time along a road path is given by the distance divided by some nominal speed, plus a delay for each traffic light along the way, and so on. • We could also make the assumed speed depend on whether the road is rural or urban, inferred from land-use data. • In addition, time of day could be factored in. • Connectivity data and GIS are now used frequently for guidance of emergency vehicles.
Contouring • Contours are lines of equal value of a surface field. • They are easily calculated from the gridded data by finding where the contours intersect the sides of each grid element.
For contours… • Effectiveness of color versus lines? • Plan view versus perspective? • Colors? • Important to browse your thematic map options before selecting one
Summary • Topics for GIS (for Science) • Including uncertainties • Working with entity types to enhance your map • For learning purposes remember: • Demonstrate proficiency in using geospatial applications and tools (commercial and open-source). • Present verbally relational analysis and interpretation of a variety of spatial data on maps. • Demonstrate skill in applying database concepts to build and manipulate a spatial database, SQL, spatial queries, and integration of graphic and tabular data. • Demonstrate intermediate knowledge of geospatial analysis methods and their applications.
Reading for this week • None… aren’t you lucky! • Watch out for next week though!
Next classes • Note March 30 – open lab (no assignment, work on your projects, get help from Max), attendance will be taken • Tuesday, April 2, Graphs, grouping, pie charts • Friday, April 6, Lab: more statistics and maps (no assignment)