1 / 135

Advanced Transport Demand Modelling: Linear Regression to GZLM

Explore a range of techniques in transportation demand modelling, from Multiple Linear Regression to Generalized Linear Models. Understand relationships between variables and apply various models for analysis. Learn about linear, binary, and count data modeling. This comprehensive guide covers different regression methods and interpretations. Discover key concepts and examples in transportation data analysis.

flax
Download Presentation

Advanced Transport Demand Modelling: Linear Regression to GZLM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transport Demand Modelling Miguel Costa

  2. 10M People employed in transportation in EU (2014) 51 fatalities Per Million inhabitants in EU (2015) 1.1M Transportation related enterprises in EU (2014)

  3. +0.7% Fatalities EU 2014/2015 -64.5% Fatalities Portugal 2001/2015 -52.4% Fatalities EU 2001/2015

  4. Contents • Multiple Linear Regression • Generalized Linear Models • Panel Models • Spatial Models • Discrete Choice Models • Ordered Choice Models • Hazard-Based Duration Models

  5. Model a wide variety of linear relationships between variables 1. Multiple Linear Regression

  6. Multiple Linear Regression (1) • It is one of the most widely applied econometric techniques: • Suitable for modelling a wide variety of relationships between variables. • In many practical applications the assumptions of linear regression are often suitably satisfied. • Its outputs are relatively easy to interpret and communicate. • The estimation of regression models is relatively easy, the routines for its estimation are available in a vast amount of software packages.

  7. Multiple Linear Regression (2) • Statistical technique used to analyse the relationship between a single dependent variable (aka criterion/regressand) and several independent variables (aka predictors/regressors) • Each IV is weighted by the regression procedure to ensure maximal prediction from the overall set of IV

  8. MLR Example (1) • Estimate the number of Trip production of 57 Traffic Assignment Zones of Chicago in 1960’s • Data: • TODU: Motorized Trips (private car or Public Transportation) per occupied dwelling unit • ACO: Average car ownership (cars per dwelling) • AHS: Average household size • Three zonal social indices: • SRI: Social Rank Index (based on the level of education) • UI: Urbanization Index (based on the fertility rate, % of single families, female labor force) • SI: Segregation Index (proportion of an area of residents who belong to certain minority groups)

  9. MLR Example (2)

  10. Assumptions • Dependent variable is continuous • Linear in parameters relation between DV and IV • DV should be normally distributed • Observations should be independently and randomly sampled • Uncertain relationship between variables • Error is independent of variables and with expected value zero, i.e., is independent across observations and the error variance is constant • Disturbance terms are not correlated • Regressors and disturbances are not corrected (exogenous regressors) • Disturbances approximately normally distributed, i.e. error ters are i.i.d.

  11. Estimation of number of trips in Lisbon per person 1.1. Example of Multiple Linear Regression

  12. Data • Average number of trips • Distance to CBD • Availability of car/motorcycle • Average total distance in km travelled per person • Average total time in minutes travelled per person; • Percentage of the main trip purpose for each type of activity (Work, Personal, Shop, Family) • Average number of children in the household • Average number of elders in a household • Percentage of interviews holding a Public Transport Pass • Distance in km from the residence to the nearest subway or suburban rail station • Parking pressure close to the residential area

  13. Results

  14. References • Washington, Simon P., Karlaftis, Mathew G. e Mannering (2003) “Statistical and econometric Methods for Transportation Data Analysis”, CRC – Chapter 3 and Annex A • Hair, Joseph P. et al (1995) “Multivariate Data Analysis with Readings”, Fourth Edition, Prentice Hall - Chapter 3

  15. GZLM extend the linear modelling framework 2. Generalized Linear Models (GZLM)

  16. Generalized Linear Regression • Dependent variable may not be continuous • Effect of independent variables may not be linear • Expected value of errors terms might not be 0. Are most commonly used to model binary of count data Unify all non linear models

  17. MLR GZLM because and or • Variation of probabilistic distribution of • So, we want to find a link function such that

  18. MLR GZLM Distribution of : Link Function: Distribution of : Link Function:

  19. Structure of GZLM Random Component Link Function Systematic Component

  20. Link Function • Used to maintain a linear relationship between the coefficients and predictors and the dependent variable • Choosing the Link Function depends on the type of data • It is a monotonous and differentiable function • can be obtained by inversing the link function:

  21. GZLM – Normal Distribution • Bell-shaped symmetrical curve centered in the mean • Dependent variable is continuous • Identity Link

  22. GZLM – Inverse Gaussian Distribution • Used for dependent variables that are positively skewed and its values are always greater than 0 • Used for diffusion processes, insurance claims, … • Inverse Squared

  23. GZLM – Gamma Distribution • Alternative for positively skewed dependent variables • Used in Survival analysis, duration-of-event data, … • Inverse Link

  24. GZLM – Poisson Distribution • Count data and used when event are rare and non negative • Used when events can be counted but non-occurrence of events cannot be counted • Used in modelling accidents, wars, epidemics, … • Log Link

  25. GZLM – Negative Binomial Distribution • Similar to Poisson distribution but is used when the variance is larger than the mean (over-dispersion of data) • This usually occurs when there are “too many 0’s” • Log Link

  26. Poisson Regression Model (1) • Number of cars passing through an intersection • Number of calls for emergency ambulance service during a tour of duty • Number of fires arising in a neighborhood • Number of accidents in a road section or intersection

  27. Poisson Regression Model (2) where denotes the unit of exposure (e.g., vehicles per year) Elasticity (effect of a change in the a variable):

  28. Negative Binomial Regression • When the Poisson condition is violated (i.e., ) and , we rewrite the link function as: , where is the dispersion term , whereK is the over dispersion parameter

  29. Road accidents estimation 2.1. Example of GZLM

  30. Motivation • Estimating the number of accident of a given road segment is important as it may be used to tackle real life dangerous road segments. • Different factors influence the amount of accidents that exist in a given road, such as the amount of vehicles, their speed, the pavement conditions and many others.

  31. Available data • IFI (International Friction Index) • Average Daily Traffic + Accumulated Annual Average Traffic • Average Speed • Percentage of urban segment • Presence of intersections • Presence of curvature • Type of curvature • Longitudinal inclination of road • Average annual precipitation • Percentage of traffic that is heavy

  32. Model • Number of accidents Poisson Regression Negative Binomial Regression

  33. Best Fit Over-dispersed Poisson Regression

  34. References • Washington, Simon P., Karlaftis, Mathew G. e Mannering (2003) Statistical and Econometric Methods for Transportation Data Analysis, CRC • McCullagh, Peter; Nelder, John (1989). Generalized Linear Models, Second Edition. Boca Raton: Chapman and Hall/CRC. ISBN 0-412-31760-5. • Lord, D., Washington, S. P., & Ivan, J. N. (2005). Poisson, Poisson-Gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accident Analysis and Prevention , pp. 35-46.

  35. 3. Panel Data Models

  36. Panel Models (1) • Panel Data Models is a regression that has both a cross-sectional and a time series dimension, where cross-section units are observed during the whole period. • Suited to study the dynamics of adjustments, which allow controlling for aggregate effects and individual heterogeneity. With

  37. Panel Models (2) Overcoming specification problems in panel data: • One-Way error components models Variable-intercept models across individuals or time • Two-Way error component models variable-intercept models across individuals and time Modeling specifications: • With Fixed Effects • With Random Effects

  38. Fixed Effects (FE) • Explore the relationship between the predictor and the outcome variables within an entity • Control something within the individual that may impact or bias the predictor • Time invariant characteristics of the individuals that are unique to the individual and are not correlated with other individual characteristics.

  39. Random Effects (RE) • Characterizes individual effects that are random and inference pertains to the population from which it was drawn • Implies homoscedastic disturbance variances and serial correlation only for disturbances of the same cross-sectional unit.

  40. Road fatalities estimation 3.1. Example of Panel Models

  41. Motivation • Estimating or analysing the number of fatalities from traffic accidents provides a key metric in understanding what may either cause these fatalities or understand what can be done better in order to prevent this. • With this in mind we use Panel Data Models to better explain the number of fatalities in 5 different European countries based on macroeconomic, human resources and traffic indicators.

  42. Available data (1) • 5 European countries (Portugal, Spain, Belgium, Sweden and Finland) over 7 years (2004-2010) • Number of fatalities • Number of police officers per 100k inhabitants • Number of light passenger cars per 1k inhabitants • Compensation of employees per capita in PPS • Total expenditure in Education as a % of GDP • % of population with Tertiary Education • Number of PhD per 100k inhabitants • Average daily working hours • Precipitation • Health expenditure as a % of GDP • Total expenditure in R&D in PPS

  43. Available data (2) Countries: 1-Portugal, 2-Spain, 3-Sweden, 4-Belgium, 5-Finland

  44. Model –Fixed Effects Representation Fixed Effects Regression Simple OLS Individual OLS

  45. Best Fit – Two-Way Error Component with Random Effects With and not being significantly different from 0

  46. Environment Perception 3.1. Example of Panel Data Models in Computer Vision

  47. Source • Zhang, F., Zhou, B., Liu, L., Liu, Y., Fung, H. H., Lin, H., & Ratti, C. (2018). Measuring human perceptions of a large-scale urban region using machine learning. Landscape and Urban Planning, 180(October 2017), 148–160.

  48. Problem • Measuring the human sense of place and quantifying the connections among the visual features of the built environment that impact the human sense of place have long been of interest to a wide variety of fields. • The model achieved a high accuracy rate in predicting six human perceptual indicators, namely, safe, lively, beautiful, wealthy, depressing, and boring.

  49. Data (1)

  50. Data (2)

More Related