460 likes | 753 Views
GY460 Techniques of Spatial Analysis. Lecture 6: Probabilistic choice models. Steve Gibbons. Introduction. Sometimes useful to model individual firm, or other agents choices over discrete alternatives Choice of transport mode Choice of firm location amongst regions
E N D
GY460 Techniques of Spatial Analysis Lecture 6: Probabilistic choice models Steve Gibbons
Introduction • Sometimes useful to model individual firm, or other agents choices over discrete alternatives • Choice of transport mode • Choice of firm location amongst regions • Choice of cities or country to migrate to • Theoretical framework • Random utility model • Empirical methods: • Micro: Probit, logit, multinomial logit • Aggregate: Poisson, OLS, gravity
Random Utility Model • RUM underlies economic interpretation of discrete choice models. Developed by Daniel McFadden for econometric applications • see JoEL January 2001 for Nobel lecture; also Manski (2001) Daniel McFadden and the Econometric Analysis of Discrete Choice, Scandinavian Journal of Economics, 103(2), 217-229 • Preferences are functions of biological taste templates, experiences, other personal characteristics • Some of these are observed, others unobserved • Allows for taste heterogeneity • Discussion below is in terms of individual utility (e.g. migration, transport mode choice) but similar reasoning applies to firm choices
Random Utility Model • Individual i’s utility from a choice j can be decomposed into two components: • Vij is deterministic – common to everyone, given the same characteristics and constraints • representative tastes of the population e.g. effects of time and cost on travel mode choice • ij is random • reflects idiosyncratic tastes of i and unobserved attributes of choice j
Random Utility Model • Vij is a function of attributes of alternative j (e.g. price and time) and observed consumer and choice characteristics. • We are interested in finding , , • Lets forget about z now for simplicity
RUM and binary choices • Consider two choices e.g. bus or car • We observe whether an individual uses one or the other • Define • What is the probability that we observe an individual choosing to travel by bus? • Assume utility maximisation • Individual chooses bus (y=1) rather than car (y=0) if utility of commuting by bus exceeds utility of commuting by car
RUM and binary choices • So choose bus if • So the probability that we observe an individual choosing bus travel is
The linear probability model • Assume probability depends linearly on observed characteristics (price and time) • Then you can estimate by linear regression • Where is the “dummy variable” for mode choice (1 if bus, 0 if car) • Other consumer and choice characteristics can be included (the zs in the first slide in this section)
The linear probability model • Unfortunately his has some undesirable properties 1 Linear regression line 0
Non-linear probability model • Better for probability function to have a shape something like: 1 0
Probits and logits • Common assumptions: • Cumulative normal distribution function – “Probit” • Logistic function – “Logit” • Estimation by maximum likelihood
Example • McFadden, D. (1974) The Measurement of Urban Travel Demand, Journal of Public Economics, 3 • Methods of commuting in San Francisco Bay area
Example 1 McFadden (1974) car versus bus commute modes in SF Bay area
Multiple choices • We often want to think about many more than two choices • Choice of regional location • Choice of transport mode with many alternatives • Choice amongst a sample of schools • How can we extend the binary choice logit model? • Random Utility model extends to many choices • Choose choice k if utility higher than for all other choices
Multinomial logit (1) • Again we need to assume some distribution for the unobserved factor • One type of distribution (extreme value) gives a simple solution for the probability that choice k is made: • This is a generalisation of the logit model with many alternatives = “multinomial logit” or “conditional logit”
Multinomial logit (2) • Recall: Vij is a linear function of observed characteristics of the individuals and their choices. e.g. for travel mode choice • Parameters estimated: • For an individual characteristic that is common across choices (e.g. income, gender): one parameter per choice • For at least one choice this is zero (base case). • For a characteristic which varies only across choices e.g. price of transport: one parameter common across choices
Example: Value of time • MNL models used to estimate “value of travel time” with from observed commuter behaviour • Three transport choices: bus (0), train (1), car (2) • Choosing bus as the base case:
Example 1: Value of time • For example, from Truong and Hensher, Economic Journal, 95 (1985) p. 15 for bus/train/car choices in Sydney 1982
Example 2: immigration • Scott, Coomes and Izyumov, (2005)The Location Choice of Employment-Based Immigrants among U.S. Metro Areas. Journal of Regional Science 45(1) 113-145 • Estimate the impact of metropolitan area characteristics on destination choice for US migrants in 1995 • 298 destination MSAs
Example 2: immigration Source: Scott, Coomes et al (note: they also report models which include individual Xs)
The independence of irrelevant alternatives problem (IIA) and the nested logit model
Multinomial logit and “IIA” • Many applications in economic and geographical journals (and other research areas) • The multinomial logit model is the workhorse of multiple choice modelling in all disciplines. Easy to compute • But it has a drawback
Independence of Irrelevant Alternatives • Consider market shares • Red bus 20% • Blue bus 20% • Train 60% • IIA assumes that if red bus company shuts down, the market shares become • Blue bus 20% + 5% = 25% • Train 60% + 15% = 75% • Because the ratio of blue bus trips to train trips must stay at 1:3
Independence of Irrelevant Alternatives • Model assumes that ‘unobserved’ attributes of all alternatives are perceived as equally similar • But will people unable to travel by red bus really switch to travelling by train? • Most likely outcome is (assuming supply of bus seats is elastic) • Blue bus: 40% • Train: 60% • This failure of multinomial/conditional logit models is called the • Independence of Irrelevant Alternatives assumption (IIA)
Independence of Irrelevant Alternatives • It is easy to see why this is: • Ratio of probabilities of choosing k (e.g. red bus) and another choice l (e.g. train) is just • All other choices drop out of this odds ratio • There are models that overcome this, e.g…
Nested Logit Model • Multinomial logit model can be generalised to relax IIA assumption • Nested Logit (Nested Multinomial Logit) Public transport (2) Car (1) Bus (3) Train (4) • Characteristics of Bus and Train affect decision of whether to use Car or Public Transport • Estimate by sequential logits…
Nested Logit Model • Value placed on choices available in second stage (3,4) enter into calculation of choice probabilities in first stage (2)… • Logit for bus versus train to estimate V3 and V4 • Define the ‘Inclusive Value’ of public transport as • Estimate logit model for Car (1) versus Public (2) using:
Example: Transport mode choice • Asensio, J., Transport Mode Choice by Commuters to Barcelona’s CBD, Urban Studies, 39(10), 2002 • Travel mode for suburban commuters • Sample of 1381 commuters from a travel survey • Records mode of transport and other individual characteristics Public transport Private car Bus Train
Example: Transport mode choice • Asensio, J., Transport Mode Choice by Commuters to Barcelona’s CBD, Urban Studies, 39(10), 2002 • Some selected coefficients • We don’t know the units of measurement, but how much more valuable is time saved car than time saved by public transport?
Other discrete choice applications • Firm location choices e.g. Head, K. and T.Mayer seminar reading (2004), Market Potential and the Location of Japanese Investment in the European Union, Review of Economics and Statistics, 86(4) 959-972 • School choice (e.g. Barro, L. (2002) School choice through relocation: evidence from the Washington, D.C. area, Journal of Public Economics, 86 p.155-189 • Migration destinations • Residential choice
Micro and aggregated choice models • Micro level logit choice models often have aggregated equivalents • i.e. if you only have choice characteristics, you could use a choice-level regression of the proportion of individuals making each choice on the choice characteristics • Obviously log(n_k) would work too (why?)
Micro and aggregated choice models • In fact, a Poisson model on aggregated data gives exactly the same coefficient estimates as the conditional logit model • Which is based on ML estimation of • See Guimaraes et al Restats (2003) • though this equivalence was known before this ‘discovery’ • Here’s an example…
Conditional logit Conditional (fixed-effects) logistic regression Number of obs = 885 LR chi2(1) = 129.65 Prob > chi2 = 0.0000 Log likelihood = -259.26785 Pseudo R2 = 0.2000 ------------------------------------------------------------------------------ choice | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .0999331 .0091997 10.86 0.000 .081902 .1179642 ------------------------------------------------------------------------------
Poisson Poisson regression Number of obs = 3 LR chi2(1) = 129.65 Prob > chi2 = 0.0000 Log likelihood = -9.3973119 Pseudo R2 = 0.8734 ------------------------------------------------------------------------------ n | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .0999331 .0091997 10.86 0.000 .081902 .1179642 _cons | 3.364614 .1450806 23.19 0.000 3.080262 3.648967 ------------------------------------------------------------------------------
OLS . reg lnp x Source | SS df MS Number of obs = 3 -------------+------------------------------ F( 1, 1) = 370.23 Model | 1.32738687 1 1.32738687 Prob > F = 0.0331 Residual | .003585331 1 .003585331 R-squared = 0.9973 -------------+------------------------------ Adj R-squared = 0.9946 Total | 1.3309722 2 .665486102 Root MSE = .05988 ------------------------------------------------------------------------------ lnp | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .101293 .0052644 19.24 0.033 .034403 .168183 _cons | -2.339238 .06295 -37.16 0.017 -3.139094 -1.539383 ------------------------------------------------------------------------------
Aggregate v micro choice models • Hence, there’s little point in using conditional logit if you only have choice-characteristics • Conditional/multinomial logit is good if you have individual and group-level characteristics • The aggregated OLS version gives rise to “Spatial interaction” models of flows between origins and destinations • = Gravity models • Widely applied (generally a-theoretically) in migration, trade and commuting applications • e.g. See Head (2003) Gravity for beginners
Gravity/spatial interaction/migration/trade models • Flow from place j to place k modelled as • Typically characteristics of destination and source include some measure of “attraction” e.g. population mass (or “market potential” in trade models) wages (endogenous) • And measure of the cost in moving between place j and d (e.g. log distance) • Hence gravity – after Newton
Gravity/spatial interaction/migration/trade models • Strong distance decay effects • Typical elasticities -0.5 to -2.0 • Even for internet site visits!: see Blum and Goldfarb (2006) Journal of International Economics • Trade literature has many examples • Disdier and Head (2003) The Puzzling Persistence Of The Distance Effect On Bilateral Trade, Review of Economics and Statistics • Finds mean distance elasticity of -0.9 from about 1500 studies
Conclusion • Generally possible to model ‘choices’ as discrete, or as flows • Discrete choice models offer the advantage of • Including micro-level (individual/firm) level characteristics • An underlying structural model (RUM) • Aggregate flow models • Simpler to compute • No need for distributional assumptions necessary for maximum likelihood (nonlinear) methods • A can’t separate individual from aggregate factors