420 likes | 683 Views
GY460 Techniques of Spatial Economic Analysis. Lecture 2: Spatial smoothing and weighting. Steve Gibbons. Objectives. Outline methods for ‘smoothing’ data (spatial and non-spatial) Understand relevance of this method in visualisation and exploratory analysis
E N D
GY460 Techniques of Spatial Economic Analysis Lecture 2: Spatial smoothing and weighting Steve Gibbons
Objectives • Outline methods for ‘smoothing’ data (spatial and non-spatial) • Understand relevance of this method in visualisation and exploratory analysis • Understand relevance of this method to modelling and regression approaches (leads to next lecture)
Readings • No essential papers for this section, though we will see examples later • Haining (2003) Chapters 5, 7; • Various chapters in Fotheringham et al • Hardle, W. (1990) Applied Non Parametric Regression, Cambridge is a useful, but non-spatial
Smoothing and spatial surfaces (1) • We can (conceptually at least) decompose a random variable at places s into ‘trend’ and ‘residual’ components: • An estimate of m(s) can be useful for …
Smoothing and spatial surfaces (2) • Exploratory Spatial Data Analysis… • Visualisation of the underlying spatial trend • Looking for spatial ‘heterogeneity’, clusters and hotspots i.e. high and low values of m(s) • Prediction/interpolation: what can we expect at locations not in the sample? • Or at locations in the sample at times we haven’t sampled • Input into further analysis – e.g. effects of market potential, accessibility, spillovers • Question: how can we define ‘large’ and ‘small’ scale? • This is not well determined
General structure • Goal is to estimate the smooth part • When data is generated by a process of the form • And is unknown an (probably) very non-linear function of s • Note: if • uncorrelated with • i.e. this is essentially a non-linear regression problem
General structure • The most common non/semi-parametric estimators take the general form • Where is a scalar weight assigned to data point given its distance from location • And • So m(s) is basically a moving weighted average
Example 2: kernel regression or smoothing • h: bandwidth – defines width of the kernel ‘windows’
Example kernels • Uniform/rectangular • Normal/Gaussian
Example 3: Locally weighted regression (1) • See Cleveland (1979), Journal of the American Statistical Association 74(368) • E.g. fit a local polynomial using regressions
Example 3: Locally weighted regression (2) • Straightforward to estimate gradients from estimated coefficients
Example 4: Splines • If local polynomials are made to join locally then you have a spline
All these methods try to estimate (E[x|s]) • Estimate at any data point or arbitrary grid point • But potential problems at edges Out of sample data
Generalising to two dimensional space • We could use multivariate weighting functions, with different bandwidths in different directions • In the case of spatial kernels, its usual to just combine the two dimensions (N-S, and E-W) into one: distance ! • To see how this relates to the previous discussion consider multiplying two univariate Gaussian kernels for E-W and N-S coordinates • Q: What this combination of two dimensions into one assume?
Smoothing and interpolation (1) • Note: we can estimate m(.) at points in the data, and at points between locations represented in the data – allows interpolation yk other places… yl dk yr dl dr dm ym xi,si1,si2 dq dn yq dp yn yp
Smoothing and interpolation • Assign average values (e.g. prices) from points to raster cells: e.g. IDW 2 1.6 1 1.8 1.7 8 1.1 b = 1 7 =4.24
London log property prices (Gaussian kernel) Source Gibbons and Machin 2003 Journal of Urban Economics
Alternative, parametric methods • Alternatively you can model m(s) as a parametric function • Polynomial series • Or polar coordinates e.g. Cheshire and Sheppard (1995), Economica • Application to land value in Reading
Matrix representation of spatial weights • You can simplify the notation for weighted averages for one location: • Or the whole vector of means • Usual to normalise so that Wx creates a mean and not a sum • Lets think about the case where Wx is projecting data on to the same set of locations • It is common then to exclude observation i from the mean for location I • You apply any of the weights schemes we’ve discussed already
Inverse distance weights • Or in general
1st order contiguity weights • A traditional weighting scheme in spatial econometrics and regional science
Spatial weights matrix for 1st order contiguity • For all n observations (regions), first-order contiguity
Other weight schemes • Neighbourhood/district blocks • Uniform weight on observations sharing the same ‘neighbourhood’; zero otherwise • Q: how would this look for 9 observations in 3 neighbourhoods? • Social/economic weights • E.g. absolute difference in incomes between two places • Weights derived from other analyses (e.g. trade flows – see Head and Mayer (2004) in later lectures) • Or commuting or migration flows (e.g. Figlio et al strategic interactions paper_ • Network distances: along road or rail networks • Need GIS to do this easily
Practical issues • Its rarely necessary (or feasible) to work with an NxN spatial weights matrix, although this is used in much spatial econometrics notation • Weights can more easily be dealt with x a column vector, and its rarely necessary to assign all N weights to every N observation (e.g. nearest neighbours) • If all else fails: you can calculate the weighted averages one observation at a time (“do” loops). • Zero distances are problem when using inverse distance weights
Applications • These spatial weights systems are fundamental building block of quantitative spatial analysis • We will discuss in greater detail future lectures and seminars…
Market Potential • Construction of market potential measures – e.g. for trade and economic geography • X is income, wages, expenditure, or population • Many, many examples • Harris (1954), The Market as a Factor in the Localization of Industry in the United States, Annals of the Association of American Geographers • Hanson (2005) Market potential, increasing returns and geographic concentration, Journal of International Economics,
Accessibility, agglomeration • Measurement of ‘accessibility’ • X is employment, GDP, population or other variable of interest • Distance weights often computed along road and rail network • Usually weights are often not row normalised i.e. • e.g. Vickerman, Spiekermann, Wegener (1999) Accessibility and Economic Development in Europe, Regional Studies • work by Daniel Graham for UK Department of Transport http://www.dft.gov.uk/pgr/economics/rdg/webia/webtheory/investigatingthelinkbetweenp1077
Spillovers, neighbourhood effects and interactions • We looked at papers on strategic interaction in seminar • Brueckner (1999), Figlio et al (1999) • X was indicators of government policy • Big literature on ‘neighbourhood effects’ and peer groups • X can be anything you think affects adult outcomes or child development • Technological spillovers • X could be R&D expenditure • More about these in next lecture(s)
Conclusions • Most spatial analysis involves estimation of local means (or sums) by re-weighting the data • Close similarity between many methods, but diverse applications… • …signals the need for some caution in analytical interpretation!