1.41k likes | 1.5k Views
Transport Demand Modelling. Miguel Costa. 10 M. People employed in transportation in EU (2014). 51 fatalities. Per Million inhabitants in EU (2015). 1.1 M. Transportation related enterprises in EU (2014). +0.7%. Fatalities EU 2014/2015. -64.5%. Fatalities Portugal 2001/2015. -52.4%.
E N D
Transport Demand Modelling Miguel Costa
10M People employed in transportation in EU (2014) 51 fatalities Per Million inhabitants in EU (2015) 1.1M Transportation related enterprises in EU (2014)
+0.7% Fatalities EU 2014/2015 -64.5% Fatalities Portugal 2001/2015 -52.4% Fatalities EU 2001/2015
Contents • Multiple Linear Regression • Generalized Linear Models • Panel Models • Spatial Models • Discrete Choice Models • Ordered Choice Models • Hazard-Based Duration Models
Model a wide variety of linear relationships between variables 1. Multiple Linear Regression
Multiple Linear Regression (1) • It is one of the most widely applied econometric techniques: • Suitable for modelling a wide variety of relationships between variables. • In many practical applications the assumptions of linear regression are often suitably satisfied. • Its outputs are relatively easy to interpret and communicate. • The estimation of regression models is relatively easy, the routines for its estimation are available in a vast amount of software packages.
Multiple Linear Regression (2) • Statistical technique used to analyse the relationship between a single dependent variable (aka criterion/regressand) and several independent variables (aka predictors/regressors) • Each IV is weighted by the regression procedure to ensure maximal prediction from the overall set of IV
MLR Example (1) • Estimate the number of Trip production of 57 Traffic Assignment Zones of Chicago in 1960’s • Data: • TODU: Motorized Trips (private car or Public Transportation) per occupied dwelling unit • ACO: Average car ownership (cars per dwelling) • AHS: Average household size • Three zonal social indices: • SRI: Social Rank Index (based on the level of education) • UI: Urbanization Index (based on the fertility rate, % of single families, female labor force) • SI: Segregation Index (proportion of an area of residents who belong to certain minority groups)
Assumptions • Dependent variable is continuous • Linear in parameters relation between DV and IV • DV should be normally distributed • Observations should be independently and randomly sampled • Uncertain relationship between variables • Error is independent of variables and with expected value zero, i.e., is independent across observations and the error variance is constant • Disturbance terms are not correlated • Regressors and disturbances are not corrected (exogenous regressors) • Disturbances approximately normally distributed, i.e. error ters are i.i.d.
Estimation of number of trips in Lisbon per person 1.1. Example of Multiple Linear Regression
Data • Average number of trips • Distance to CBD • Availability of car/motorcycle • Average total distance in km travelled per person • Average total time in minutes travelled per person; • Percentage of the main trip purpose for each type of activity (Work, Personal, Shop, Family) • Average number of children in the household • Average number of elders in a household • Percentage of interviews holding a Public Transport Pass • Distance in km from the residence to the nearest subway or suburban rail station • Parking pressure close to the residential area
References • Washington, Simon P., Karlaftis, Mathew G. e Mannering (2003) “Statistical and econometric Methods for Transportation Data Analysis”, CRC – Chapter 3 and Annex A • Hair, Joseph P. et al (1995) “Multivariate Data Analysis with Readings”, Fourth Edition, Prentice Hall - Chapter 3
GZLM extend the linear modelling framework 2. Generalized Linear Models (GZLM)
Generalized Linear Regression • Dependent variable may not be continuous • Effect of independent variables may not be linear • Expected value of errors terms might not be 0. Are most commonly used to model binary of count data Unify all non linear models
MLR GZLM because and or • Variation of probabilistic distribution of • So, we want to find a link function such that
MLR GZLM Distribution of : Link Function: Distribution of : Link Function:
Structure of GZLM Random Component Link Function Systematic Component
Link Function • Used to maintain a linear relationship between the coefficients and predictors and the dependent variable • Choosing the Link Function depends on the type of data • It is a monotonous and differentiable function • can be obtained by inversing the link function:
GZLM – Normal Distribution • Bell-shaped symmetrical curve centered in the mean • Dependent variable is continuous • Identity Link
GZLM – Inverse Gaussian Distribution • Used for dependent variables that are positively skewed and its values are always greater than 0 • Used for diffusion processes, insurance claims, … • Inverse Squared
GZLM – Gamma Distribution • Alternative for positively skewed dependent variables • Used in Survival analysis, duration-of-event data, … • Inverse Link
GZLM – Poisson Distribution • Count data and used when event are rare and non negative • Used when events can be counted but non-occurrence of events cannot be counted • Used in modelling accidents, wars, epidemics, … • Log Link
GZLM – Negative Binomial Distribution • Similar to Poisson distribution but is used when the variance is larger than the mean (over-dispersion of data) • This usually occurs when there are “too many 0’s” • Log Link
Poisson Regression Model (1) • Number of cars passing through an intersection • Number of calls for emergency ambulance service during a tour of duty • Number of fires arising in a neighborhood • Number of accidents in a road section or intersection
Poisson Regression Model (2) where denotes the unit of exposure (e.g., vehicles per year) Elasticity (effect of a change in the a variable):
Negative Binomial Regression • When the Poisson condition is violated (i.e., ) and , we rewrite the link function as: , where is the dispersion term , whereK is the over dispersion parameter
Road accidents estimation 2.1. Example of GZLM
Motivation • Estimating the number of accident of a given road segment is important as it may be used to tackle real life dangerous road segments. • Different factors influence the amount of accidents that exist in a given road, such as the amount of vehicles, their speed, the pavement conditions and many others.
Available data • IFI (International Friction Index) • Average Daily Traffic + Accumulated Annual Average Traffic • Average Speed • Percentage of urban segment • Presence of intersections • Presence of curvature • Type of curvature • Longitudinal inclination of road • Average annual precipitation • Percentage of traffic that is heavy
Model • Number of accidents Poisson Regression Negative Binomial Regression
Best Fit Over-dispersed Poisson Regression
References • Washington, Simon P., Karlaftis, Mathew G. e Mannering (2003) Statistical and Econometric Methods for Transportation Data Analysis, CRC • McCullagh, Peter; Nelder, John (1989). Generalized Linear Models, Second Edition. Boca Raton: Chapman and Hall/CRC. ISBN 0-412-31760-5. • Lord, D., Washington, S. P., & Ivan, J. N. (2005). Poisson, Poisson-Gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accident Analysis and Prevention , pp. 35-46.
3. Panel Data Models
Panel Models (1) • Panel Data Models is a regression that has both a cross-sectional and a time series dimension, where cross-section units are observed during the whole period. • Suited to study the dynamics of adjustments, which allow controlling for aggregate effects and individual heterogeneity. With
Panel Models (2) Overcoming specification problems in panel data: • One-Way error components models Variable-intercept models across individuals or time • Two-Way error component models variable-intercept models across individuals and time Modeling specifications: • With Fixed Effects • With Random Effects
Fixed Effects (FE) • Explore the relationship between the predictor and the outcome variables within an entity • Control something within the individual that may impact or bias the predictor • Time invariant characteristics of the individuals that are unique to the individual and are not correlated with other individual characteristics.
Random Effects (RE) • Characterizes individual effects that are random and inference pertains to the population from which it was drawn • Implies homoscedastic disturbance variances and serial correlation only for disturbances of the same cross-sectional unit.
Road fatalities estimation 3.1. Example of Panel Models
Motivation • Estimating or analysing the number of fatalities from traffic accidents provides a key metric in understanding what may either cause these fatalities or understand what can be done better in order to prevent this. • With this in mind we use Panel Data Models to better explain the number of fatalities in 5 different European countries based on macroeconomic, human resources and traffic indicators.
Available data (1) • 5 European countries (Portugal, Spain, Belgium, Sweden and Finland) over 7 years (2004-2010) • Number of fatalities • Number of police officers per 100k inhabitants • Number of light passenger cars per 1k inhabitants • Compensation of employees per capita in PPS • Total expenditure in Education as a % of GDP • % of population with Tertiary Education • Number of PhD per 100k inhabitants • Average daily working hours • Precipitation • Health expenditure as a % of GDP • Total expenditure in R&D in PPS
Available data (2) Countries: 1-Portugal, 2-Spain, 3-Sweden, 4-Belgium, 5-Finland
Model –Fixed Effects Representation Fixed Effects Regression Simple OLS Individual OLS
Best Fit – Two-Way Error Component with Random Effects With and not being significantly different from 0
Environment Perception 3.1. Example of Panel Data Models in Computer Vision
Source • Zhang, F., Zhou, B., Liu, L., Liu, Y., Fung, H. H., Lin, H., & Ratti, C. (2018). Measuring human perceptions of a large-scale urban region using machine learning. Landscape and Urban Planning, 180(October 2017), 148–160.
Problem • Measuring the human sense of place and quantifying the connections among the visual features of the built environment that impact the human sense of place have long been of interest to a wide variety of fields. • The model achieved a high accuracy rate in predicting six human perceptual indicators, namely, safe, lively, beautiful, wealthy, depressing, and boring.