1 / 21

Deer-Vehicle Crashes

Deer-Vehicle Crashes . Hui Anne Ben. Goal. Create a model that will be useful in predicting the number of deer vehicle crashes on a given section of roadway. The Response Variable. Y = number of deer-vehicle crashes per half-mile section of roadway over 1 year period

ivrit
Download Presentation

Deer-Vehicle Crashes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deer-Vehicle Crashes Hui Anne Ben

  2. Goal • Create a model that will be useful in predicting the number of deer vehicle crashes on a given section of roadway

  3. The Response Variable • Y = number of deer-vehicle crashes per half-mile section of roadway over 1 year period • Location – Ashtabula County

  4. The Predictor Variables • X1 = no. of vertical curves • X2 = no. of horizontal curves • X3 = no. of ditches • X4 = no. of residences • X5 = no. driveways • X6 = % of adjacent forest land

  5. Preview • 40 observations total • 6 candidate regressors • X’s are clearly known • Lurking Variable(s) seem possible however • Y is a count per unit time

  6. Lurking Variable Residual plot from full linear regression reveals two distinct groups Data is divided in half

  7. Enter a New Variable • Create an unknown variable by grouping data into two group based on the two groups from the residuals plot • Noticeable difference between Y values for the first 20 observations and the last 20

  8. Unknown Variable • New variable is X7 = unknown • Variable is unknown to us • Was not considered during the collection of data

  9. Variable selection • Best Subsets method in conjunction with Several Extra Sum of Squares Tests • Four variables X1, X5, X6, X7 are chosen

  10. Linear regression analysis • We run linear regression • Y vs. X1, X5, X6, X7 • Model : • R2 = 88.2% • SSE = 51.17 • Decent

  11. Residuals Plot for the Linear Regression

  12. Correlation Analysis • Noticeable Correlation between X7 and X6 • Unknown variable is associated with forested land

  13. A Thought About the Unknown Variable • Unknown variable negatively correlated with % of forested land • possible values of X7=Unknown: 0 and 1 • Might correspond to section of county • 0 -> rural part of county • 1 -> urban part of county

  14. A Transformation • Many transformations were attempted • Best one: Y* = ln( Y + e2 ) • R2 = 87.9% (untransformed) • SSE = 51.00 (untransformed) • Conclusion: not better than original linear model

  15. Poisson Regression • Recall: Y is a count per unit of time • A Poisson Model is now derived • Proc GENMOD • Link function ln(Y)

  16. Poisson Regression Analysis • Fits and Residuals were collected from work library in SAS • R2 = 89.15% • SSE = 45.04 • Not bad

  17. Residuals Plot for Poisson Model

  18. Dominant Variable • Type I and and Type III analysis in SAS • Suggests that the unknown variable is the only significant contributor • Decision: do not throw out the other regressors • Unknown variable is just a dominating variable

  19. Type I and Type III Analysis

  20. The Winning Model • The Poisson Model gets our vote

  21. Thank You Abdullah Alhomidan (civil engineering) gave permission for us to use his data. FIN

More Related