640 likes | 728 Views
SPATIAL MODELS FOR DATA REPORTED AS COUNTS OVER GEOGRAPHIC AREAS. Gary Simon, 28 APRIL 2006. With special thanks… Frank LoPresti, Academic Computing Services, GIS Group Kevin Tun, Stern I.T. Group. Here’s an interesting obscure formula. Consider a set of points:
E N D
SPATIAL MODELS FOR DATA REPORTED AS COUNTS OVER GEOGRAPHIC AREAS Gary Simon, 28 APRIL 2006
With special thanks… Frank LoPresti, Academic Computing Services, GIS Group Kevin Tun, Stern I.T. Group
Here’s an interesting obscure formula. Consider a set of points: Point 1: (x1 , y1) Point 2: (x2 , y2) …. Point n: (xn , yn)
Connect the points in order. Draw a line from point 1 to point 2, then from point 2 to point 3, …., from point n-1 to point n. Finally draw a line from point n back to point 1. Assume that none of the segments cross, so that this is a polygon.
The area of the resulting polygon is given by The + occurs when the perimeter is drawn counter-clockwise, the – when drawn clockwise.
The data: K regions Counts zl , z2 , …, zK Total count z+ Populations P1,P2 , …, PK Total population P+
Uniformity is often rejected. What should be the alternative to uniformity? Techniques like kriging assess covariance structure and not the structure of the expected counts.
There are also techniques that measure spatial association (Cliff and Ord, 1973, 1981) with I and with c, and these also relate to covariance notions. Cliff, A.D. and Ord, J.K. (1981) Spatial Autocorrelation, London: Pion. Cliff, A.D. and Ord, J.K. (1981) Spatial Processes: Models and Applications, London: Pion. Spatial association can also be given angular interpretations (Simon, 1997). Simon, Gary (1997) An Angular Version of Spatial Correlations, with Exact Significance Tests, Geographical Analysis, vol 29, #3, pp 267-278.
Let’s form a model for the “spatial force” and give this model a central location or hot spot. Note this location as s = . Here sxand syare parameters to be estimated.
Let f(z) be the spatial force at location z = . Then let f(z) = =
Since f(z) = , f(s) = c . At any z with = α , f(z) = . Thus α is a “half-strength” distance.
This can be generalized to mix uniform and hot-spot features. f(z) = The parameter ω assesses the strength of the hot-spot relative to uniformity. Negative ω notes a protective effect.
The maximum likelihood expected counts { ek } will be used in the test statistic G2 =
The value of ekwill be computed as Pk× “average” force on county k scaled so that
Consider cancer rates in Florida. “Age-Adjusted Death Rates for Florida, 1998 – 2002.” http://www.stateofflorida.com
Florida has 67 counties. There were 38,814 cases in a population of 15,982,378. The rate is 2.43 per 1,000. The G2 statistic is 2,816.27 on 66 degrees of freedom. The cancer rates are not uniform.
The maximum likelihood fit occurred at parameter values sx = 375.8877 sy =300.6793 α = 13.4375 ω = 2.325
This fit has G2 = 2,246.93 on 67 - 4 = 62 degrees of freedom. This is still an inadequate fit, but the reduction in G2 is 569.34 with four degrees of freedom.
The fitted values are these: The hot spot is at (82.56 w long, 28.80 n lat), in Citrus County.
Map information comes in (longitude, latitude) form that needs to be converted to (x, y) form in (say) miles.
Each degree of latitude has the same mile equivalent. North Pole One degree of latitude cuts off same arc length at all latitudes. Equatorial plane
However, a degree of longitude represents a small distance near the poles and a large distance near the equator. 30° N Latitude Equator
Problem: Find the length of one degree of longitude at latitude θ. Solution: Form a triangle with one corner at the north pole, an angle of one degree at the north pole, and with sides 90°-θ.
30° N Latitude Equator In a spherical triangle, the sides also have angle measure.
We can use the law of sines for spherical triangles: A, B, C are the angles and a, b, c are the sides.
The computation of E(zk) = ek is found as Pk× “average” force on county k. This average force could be f(ck), where ck is the center of the county.
Instead we will use where denotes the county and h is the two-dimensional variable of integration.
The value of can be obtained from outside sources. The challenge comes in finding This can be difficult even for simple figures; is not simple.
Finding requires some organized description of , the boundary of . Fortunately, such descriptions are available from mapping programs.
Mapping program MapInfo will export an MIF file giving coordinates of (latitude, longitude) points on the boundary. The file has layout 26 -75 40.1288 -75.0154 40.1378 -75.1094 40.0454 . . . -75 40.0294 -74.9755 40.0485 -74.9893 40.1259 -75 40.1288
With the boundary so identified, county is a polygon, so the task of finding is equivalent to integrating over that polygon. The mathematics can be done with Green’s theorem.
Green’s theorem for connected region and for scalar functions P and Q of two variables is =
The boundary needs to be parameterized as a function of a single variable, say t. This is possible when the boundary is made up of simple curves or, as in the MapInfo story, straight lines.
The line connecting to is parameterized as Note that dy means .
In the statement of Green’s theorem, = let’s use and so that
Green’s theorem is now = = Area() =
This solves as P(x, y) = 0 and Q(x, y) = x and then Area() =
With the boundary given as a polygon, the calculation is routine. The consequence is Area() = where m is the number of boundary points of region .
This calculation finds the area of region and, as a side benefit, discovers whether the point ordering was clockwise or counter-clockwise.
Match to Green’s theorem = with P(x, y) ≡ 0 and
This means that we need to be able to find Q(x, y) = The solution is Q(x, y) =
Then = =
Let , , … , be the boundary points of . Then Segment k connects point k to point k + 1. (Last segment goes back to point 1.)