1 / 6

Rodent Complaints in Boston

Rodent Complaints in Boston. Question : Is the spatial pattern of rodent complaints in Boston is related to other information in a) the Mayor’s Service Hotline data or b ) the 2010 US census? Result :

stu
Download Presentation

Rodent Complaints in Boston

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rodent Complaints in Boston Question: Is the spatial pattern of rodent complaints in Boston is related to other information in a) the Mayor’s Service Hotline data or b) the 2010 US census? Result: While statistically significant correlations are found, no clear causal relationship is suggested by the information at hand. Data: Boston Mayor Service Hotline: https://data.cityofboston.gov/ 2010 US census: http://tinyurl.com/otsvzma (links to mass.gov website) Number of rodent complaints in Boston per 2010 US census tract, 2011-2013

  2. Linear Model of Rodent Complaints – OLS Question: Can other information in the Mayor’s Service Hotline and 2010 census explain the spatial variability in the rodent complaints? Ordinary Least Squares (OLS) model of rodent complaints Exogenous variables: 44 variables extracted from Mayor’s Service Hotline and 2010 census The model captures some of the spatial pattern in rodent complaints, but the difference map reveals model deficiencies. Of particular importance are the large residuals in census tracts with high observed rodent counts. Note: Gray census tracts are those with < 500 residents and 1 outlier, located in Allston.

  3. Linear Model of Rodent Complaints – Poisson Question: Can we make a better model using a generalized linear model (GLM) framework, assuming a Poisson distribution of rodent complaints? Generalized Linear Model (GLM), assuming Poisson distribution of rodent complaints This exercise is reasonable because rodent complaints in Boston follow something closer to a Poisson than a Gaussian distribution. Flipping between slides shows that red/blue tones in the difference map are somewhat muted in GLM. However, large residuals do persist. Note: Gray census tracts are those with < 500 residents and 1 outlier, located in Allston.

  4. Linear Model of Rodent Complaints – Poisson Improvement using the Poisson GLM may be difficult to visualize in the maps, so I plot true and modeled rodent complaints in ascending order. The GLM outperforms OLS at small values of rodent complaints, where OLS often predicts negative values. The Poisson regression also performs better at large values of rodent complaints, though there is still room for improvement. Robust interpretation of a model with many exogenous variables, some of which may exhibit strong colinearity, is difficult. I therefore seek a simpler model.

  5. Linear Model of Rodent Complaints – Sparsity I perform OLS regression again, regularizing the vector of regression coefficients using its L1 norm. The strength of the regularization is controlled by a parameter, α. L1 regularization promotes sparse solutions, meaning that many regression coefficients are set to zero. The plot at right shows regression coefficients turning on as I relax the regularization constraint (moving from right to left on the x-axis). Perhaps I can make a simpler model using the first few coefficients to turn on. OLS coefficients at different strengths of regularization We select the variables associated with the first five regression coefficients to turn on using L1 regularization. We build a linear model from this smaller set of variables.

  6. Linear Model of Rodent Complaints – Conclusions A Poisson GLM using the five coefficients selected on the previous slide reveals nothing about rodent complaints in Boston. I skip showing the results because they are of no interest. Instead, I summarize my findings and move on to Part 2: unsupervised learning! • Conclusions: • The spatial distribution of rodent counts is not obviously causally related to most information in the data set. • Assuming the correct functional form of y can impact regression results. • L1 regularization can provide sparse estimates of regression coefficients, but this doesn’t necessarily facilitate interpretation of regressions. • Other data may be more useful for understanding the spatial distribution of rodents in Boston. I would prefer to have data on the age of buildings, zoning information (more rats around more food waste?), and the population density of outdoor cats! • Most importantly, if this were a serious investigation, I would first to speak with an expert in rodent control. Someone has put thought into this before, and that person could help facilitate this kind of analysis.

More Related