1 / 0

Regression in ArcGIS

Regression in ArcGIS. Pattern analysis (without regression): Are there places where people persistently die young? Where are precipitation amounts consistently high? Where are 911 emergency call hot spots?. Why regression?. Regression allows us to…. Understand key factors

lea
Download Presentation

Regression in ArcGIS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression in ArcGIS

  2. Pattern analysis (without regression): Are there places where people persistently die young? Where are precipitation amounts consistently high? Where are 911 emergency call hot spots?
  3. Why regression? Regression allows us to… Understand key factors What are the most important habitatcharacteristics for an endangered bird? Predict unknown values How much rainfall will occur in agiven location? Test hypotheses “Broken Window” Theory: Is there apositive relationship between vandalismand residential burglary?
  4. Possible applications Education Why are literacy rates so low in particular regions? Natural resource management What are the key variables that explain high forest fire frequency? Ecology Which environments should be protected, to encourage reintroduction of an endangered species? Transportation What demographic characteristics contribute to high rates of public transportation usage? Many more… Business, crime prevention, epidemiology, finances, public safety, public health
  5. Correlation vs Regression Regression line predicts Mathematically, they are identical. Conceptually, very different. Correlation Co-variation Relationship or association b/t variables No direction or causation is implied Y X income education Regression Prediction of Y from X Implies, but does not prove, causation X independent variable Y dependent variable
  6. As values go up () on one, they go up () on the other Positive relationship R2= 0.94 r = .97 Se= 0.3 R2= 0.50 r = .71 Se= 1.1 b = 1.1 R2= r = 1 Se= 0.0 Sy= 2 b = 2 perfect positive very strong moderate R2= 0.26 r = .51 Se= 1.3 b = 0.8 R2= 0.07 Se= 1.8 b = 0.1 R2= r= 0.00 Se= Sy= 2 b = 0 weak very weak no relationship As R2 gets smaller, the slope of the regression line b gets closer to zero, standard error (Se) gets larger, and Se closer to the standard deviation of the dependent variable (Y) (Sy= 2). Negative R2= 1 r = -1 R2= 0.50 r = -0.71 As values go up on one (), they go down () on the other perfect negative strong negative
  7. Regression analysis terms and concepts Dependent variable (Y): What you are trying to model or predict (e.g., residential burglary). Explanatory variables (X): Variables you believe cause or explain the dependent variable (e.g., income, vandalism, number of households). Coefficients (β): Values, computed by the regression tool, reflecting the relationship between explanatory variables and the dependent variable. Residuals (ε): The portion of the dependent variable that isn’t explained by the model; the model under- and over-predictions.
  8. Y b 1 a X Simple linear regression Yi Regression line Ŷi X Concerned with “predicting” one variable Y, the dependent variable using another variable X, the independent variable Y = a + bX +  ais the interceptthe value of Y when X = 0 bis the regression coefficient (slope of the line)  the change in Y for a one unit change in X  = residual= error = Yi-Ŷi =Actual (Yi) – Predicted (Ŷi) 0
  9. Y = a +bX + If β = 0 then X has no effect on Y, therefore Null Hypothesis (H0): in the population β = 0 Alternative Hypothesis (H1): in the population β ≠ 0 Thus, we test if our sample regression coefficient, b, is sufficiently different from zero to reject the Null Hypothesis and conclude that X has a statistically significant affect on Y.
  10. Regression does not prove direction or causeExample: Income and Illiteracy Illiteracy Income Income Illiteracy Provinces with higher incomes can afford to spend more on education, so illiteracy is lower Higher Income>>>>Less Illiteracy The higher the level of literacy (and thus the lower the level of illiteracy) the more high income jobs. Less Illiteracy>>>>Higher Income Regression will not decide!
  11. Anscombe's quartet : Summary statistics are the same for all four data sets: Mean: 7.5 Standard deviation: 4.12 Correlation: 0.816 Regression line: y = 3 + 0.5x Always look at your data, don’t rely on the statistics Anscombe, Francis J. 1973. "Graphs in statistical analysis". The American Statistician27: 17–21.
  12. Be aware of spurious relationships Omitted variable -- both are related to a third variable not included in the analysis Eating ice cream inhibits swimming ability. -- eat too much, you cannot swim Summer temperatures: -- more people swim (and some drown) -- more ice cream is sold
  13. Regression analysis in ArcGIS 100 80 Ordinary Least Square (OLS) Geographically Weighted Regression (GWR) 60 40 20 Observed Values Predicted Values 0 20 40 80 100 0 60
  14. Building an OLS regression model Average age of death Poverty, hospital access, environmental pollution Why are people dying young in South Dakota? Do economic factors explain this spatial pattern? Choose your dependent variable (Y). Identify potential explanatory variables (X). Explore those explanatory variables. Run OLS regression with different combinations of explanatory variables, until you find a properly specified model.
  15. Use OLS to test hypotheses Why are people dying young in South Dakota? Do economic factors explain this spatial pattern? Poverty rates explain 66% of the variation in the average age of death dependent variable: Adjusted R2 0.659 However, significant spatial autocorrelation among model residuals indicates important explanatory variables are missing from the model.
  16. Multivariate models Explore variable relationships using the scatterplot matrix Consult theory and field experts Look for spatial variables Run OLS (this is an iterative, often tedious, trial and error, process)
  17. 2 No redundancy among explanatory variables. Check OLS results 1 Coefficients have the expected sign. 3 Coefficients are statistically significant. 4 Residuals are normally distributed. 5 Strong Adjusted R-Square value. 6 Residuals are not spatially autocorrelated.
  18. 1 Regression coefficients have the expected sign Coefficient sign (+/-) and magnitude reflect each explanatory variable’s relationship to the dependent variable Does it make sense for low income to be negatively related to average age of death?  Income  Age of death
  19. Y b 1 a X Note: Partial regression coefficients are only directly comparable if the units are all the same: ex: all US$ if X1 is income, then a 1 unit change is $1 but if X2 Euro€ or even cents ₵, 1 unit is not the same! and if X3 is % population urban, 1 unit is very different
  20. 2 No redundancy among explanatory variables (multicollinearity) [1] Large VIF (> 7.5, for example) indicates explanatory variable redundancy. Find a set of explanatory variables that have low variance inflation factor (VIF) values. In a strong model, each explanatory variable gets at a different facet of the dependent variable. Remove variables one by one until VIF ok.
  21. 3 Coefficients are statistically significant * Statistically significant at the 0.05 level. When the Koenker statistic is stat sig, then can only trust the Robust Prob column. (GWR might improve results)
  22. 4 Residuals are normally distributed X When the Jarque-Bera test is statistically significant: The model is biased Results are not reliable Often indicates that a key variable is missing from the model Sometimes indicates relationship is not linear
  23. 5 Model performance High R2 X
  24. Reduced or Adjusted R2? R2 will always increase each time another independent variable is included Adjusted is normally used instead of R2 in multiple regression
  25. Regression coefficient (R2) Strength (robustness) of a model in predicting Y Akaike’s Information Criterion (AIC) Lowest is better Compare among models to find most parsimonious As long as the dependent variable is the same, the AIC value for different OLS/GWR models are comparable
  26. 6 Residuals are not spatially autocorrelated X Statistically significant clustering of under and over predictions. Random spatial pattern of under and over predictions. Spatial clustering of over- and under-predictions
  27. Run the Moran’s I tool
  28. Example of results in ArcMap: Species–area relationship Standard deviation of residuals. Red = island has more species than expected by regression model. less more
  29. Tip: Disable background processing ArcGIS 10 default = background processing
  30. Questions? ? ?
  31. ESRI Tutorial
More Related