1 / 60

Spatial Data Analysis of Areas: Regression

Spatial Data Analysis of Areas: Regression. Introduction. Basic Idea Dependent variable (Y) determined by independent variables X1,X2 (e.g., Y = mX + b). Uses of regression: Description Control Prediction. Simple Linear Regression. Y i =  0 +  1 X i +  i

lindsay
Download Presentation

Spatial Data Analysis of Areas: Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spatial Data Analysis of Areas: Regression

  2. Introduction • Basic Idea • Dependent variable (Y) determined by independent variables X1,X2 (e.g., Y = mX + b). • Uses of regression: • Description • Control • Prediction

  3. Simple Linear Regression Yi=0+1Xi+i Yi value of dependent variable on trial i 0, 1 (unknown parameters) Xi value of independent variable on trial i i ith error term (unexplained variation), where E [i]=0, 2(i)= 2 error terms are N(0, 2) basic model

  4. Multiple Regression Basic Model • Yi is the ithobservation of the dependent variable • are parameters • are observations of the ind variables • are independent and normal estimated model ith residual

  5. Sometimes we need to transform the data Scatter plots: (a) Y versus PORC3_NR (percentage of large farms in number ); (b) log10 Y versus log 10 (PORC3_NR). Predicted versus Observed Plots: (a) model with variables not transformed): R2 = 0.61; (b) Model 7: R2 = 0.85.

  6. Analysis of variation Sum of squares of Y = Sum of squares of estimate + Sum of squares of residuals Precision of estimates and fit • Dividing both sides by TSS (sum of squares of Y): 1 = ESS/TSS + RSS/TSS where ESS/TSS = r2(coefficient of determination) • r2 gives the proportion of total variation “explained” by the sample regression equation. • The closer is r2 to 1.00, the better the fit.

  7. Analysis of Residuals • It is a good idea to plot the residuals against the independent variables to see if they show a trend. • Possible behaviors: • Correlation (e.g., the higher the independent variable, the higher the residual) • Nonlinearity • Heteroskedacity (i.e., the variance of the residual increases or decreases with the independent variable). • Regression assumes that residuals are constant variance and normally distributed.

  8. 6 4 2 Y 0 0 20 40 60 -2 -4 -6 X Good Residual Plot

  9. Nonlinearity 0.25 0.2 0.15 0.1 residual 0.05 0 -0.05 0 20 40 60 -0.1 -0.15 X

  10. Heteroskedacity 1 0.5 residual 0 0 20 40 60 -0.5 X -1

  11. Regression with Spatial Data: Understanding Deforestation in Amazonia

  12. The forest...

  13. The rains...

  14. The rivers...

  15. Deforestation...

  16. Fire...

  17. Fire...

  18. Amazon Deforestation 2003 Deforestation 2002/2003 Deforestation until 2002 Fonte: INPE PRODES Digital, 2004.

  19. What Drives Tropical Deforestation? % of the cases  5% 10% 50% Underlying Factors driving proximate causes Causative interlinkages at proximate/underlying levels Internal drivers *If less than 5%of cases, not depicted here. source:Geist &Lambin

  20. 1 9 7 3

  21. 1 9 9 1 Courtesy: INPE/OBT

  22. 1 9 9 9 Courtesy: INPE/OBT

  23. Deforestation in Amazonia PRODES (Total 1997) = 532.086 km2 PRODES (Total 2001) = 607.957 km2

  24. Análise de tendências • Modelos econômicos Modelling Tropical Deforestation Coarse: 100 km x 100 km grid Fine: 25 km x 25 km grid

  25. Amazônia in 2015? fonte: Aguiar et al., 2004

  26. Factors Affecting Deforestation

  27. Coarse resolution: candidate models

  28. Terra do Meio, Pará State South of Amazonas State Hot-spots map for Model 7: (lighter cells have regression residual < -0.4) Coarse resolution: Hot-spots map

  29. Modelling Deforestation in Amazonia • High coefficients of multiple determination were obtained on all models built (R2 from 0.80 to 0.86). • The main factors identified were: • Population density; • Connection to national markets; • Climatic conditions; • Indicators related to land distribution between large and small farmers. • The main current agricultural frontier areas, in Pará and Amazonas States, where intense deforestation processes are taking place now were correctly identified as hot-spots of change.

  30. Spatial regression models

  31. Spatial regression • Specifying the Structure of Spatial dependence • which locations/observations interact • Testing for the Presence of Spatial Dependence • what type of dependence, what is the alternative • Estimating Models with Spatial Dependence • spatial lag, spatial error, higher order • Spatial Prediction • interpolation, missing values source: Luc Anselin

  32. Nonspatial regression • Objective • Predict the behaviour of a response variable, given a set of known factors (explanatory variables). • Multivariate nonspatial models yk = 0 + 1x1k +… + ixik + i • yk = estimate of response variable for object k • i = regression coefficient for factor i • xi = explanatory variable i for region k • k =random error • Adjustment quality n S ( y – y ) 2 i i = 1 i R = 1 – 2 n S 2 ( y – y ) i i = 1 i

  33. Nonspatial regression: hypotheses • Y = X + (model) • Explanatory variables are linearly independent • Y - vector of samples of response variable (n x 1) • X – matrix of explanatory variables (n x k) •  - coefficient vector (k x 1) •  - error vector (n x 1) • E(i ) = 0 ( expected value) • i ~ N( 0, i2) (normal distribution)

  34. Generalized linear models • g(Y) = X + U • Response is some function of the explanatory variables • g(.) is a link function • Ex: logarithm function • U = error vector • (U) = 0 (expected value) • (UUT) = C (covariance matrix) • if C= 2I, the error is homoskedastic

  35. Spatial regression • Spatial effects • What happens if the original data is spatially autocorrelated? • The results will be influenced, showing statistical associated where there is none • How can we evaluate the spatial effects? • Measure the spatial autocorrelation (Moran’s I) of the regression residuals

  36. Regression using spatial data • Try a linear model first • Adjust the model and calculate residuals • Are the residuals spatially autocorrelated? • No, we’re OK • Yes, nonspatial model will be biased and we should propose a spatial model

  37. Spatial dependence • Estimating the Form/Extent of Spatial Interaction • substantive spatial dependence • spatial lag models • Correcting for the Effect of Spatial Spill-overs • spatial dependence as a nuisance • spatial error models source: Luc Anselin

  38. Spatial dependence • Substantive Spatial Dependence • lag dependence • include Wy as explanatory variable in regression • y = ρWy + Xβ + ε • Dependence as a Nuisance • error dependence • non-spherical error variance • E[εε’] = Ω • where Ω incorporates dependence structure

  39. Interpretation of spatial lag • True Contagion • related to economic-behavioral process • only meaningful if areal units appropriate (ecological fallacy) • interesting economic interpretation (substantive) • Apparent Contagion • scale problem, spatial filtering source: Luc Anselin

  40. Interpretation of Spatial Error • Spill-Over in “Ignored” Variables • poor match process with unit of observation or level of aggregation • apparent contagion: regional structural change • economic interpretation less interesting nuisance parameter • Common in Empirical Practice source: Luc Anselin

  41. Cost of ignoring spatial dependence • Ignoring Spatial Lag • omitted variable problem • OLS estimates biased and inconsistent • Ignoring Spatial Error • efficiency problem • OLS still unbiased, but inefficient • OLS standard errors and t-tests biased source: Luc Anselin

  42. Spatial regression models • Incorporate spatial dependency • Spatial lag model • Two explanatory terms • One is the variable at the neighborhood • Second is the other variables

  43. Spatial regimes • Extension of the non-spatial regression model • Considers “clusters” of areas • Groups each “cluster” in a different explanatory variable yi = 0 + 1x1 +… + ixi + i • Gets different parameters for each “cluster”

  44. A study of the spatially varying relationship between homicide rates and socio-economic data of São Paulo using GWR Frederico Roman Ramos CEDEST/Brasil

  45. Geographically Weighted Regression • Extensão of traditional regression model where the parameters are estimaded locally • (ui,vi) are the geographical coordinates of point i. • The betas vary in space (each location has a different coeficient) • We estimate an ordinary regression for each point where the neighbours have more weight

  46. 70 Km 30 Km Introducing São Paulo Some numbers: Metropolitan region: Population: 17,878,703 (ibge,200) 39 municipalities Municipality of São Paulo: Population: 10,434,252 HDI_M: 0.841 (pnud, 2000) 96 districts IEX: 74 out of 96 districts were classified as socially excluded(cedest,2002) 4,637 homicide victims in 2001

  47. Data 4,637 homicide victims residence geoadressed 2001 456 Census Sample Tracts 2000

  48. Critical areas Kernel Density Function Bandwidth = 3 Km Critical areas Critical areas Density surface of victim-based homicides

  49. Victim-based homicide rate (Tx_homic) Tx_homic = count homicide events (2001) *100.000 population (census, 2000)

More Related