1 / 27

Discrete Choice Analysis: A SemiParametric Approach Professor Baibing Li

Discrete Choice Analysis: A SemiParametric Approach Professor Baibing Li School of Business & Economics Loughborough University Loughborough LE11 3TU, UK. Overview. Introduction A semiparametric discrete choice model Model estimation An empirical study Conclusions.

haviland
Download Presentation

Discrete Choice Analysis: A SemiParametric Approach Professor Baibing Li

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discrete Choice Analysis: A SemiParametric Approach Professor Baibing Li School of Business & Economics Loughborough University Loughborough LE11 3TU, UK

  2. Overview • Introduction • A semiparametric discrete choice model • Model estimation • An empirical study • Conclusions

  3. What is discrete choice analysis? An important research area in econometrics. McFadden won the 2000 Nobel Memorial Prize in Economic Sciences "for his development of theory and methods for analyzing discrete choice“ Problems involve choice between two or more discrete alternatives, such as entering or not entering the labor market, or choosing among several modes of transport (e.g. train, bus, cycling, walking, etc.) Discrete choice models statistically relate the choice made by each person to the attributes of the person and the attributes of the alternatives available to the person For example, the choice of which car a person buys is statistically related to the person’s income and age as well as to price, fuel efficiency, size, and other attributes of each available car Introduction

  4. What is the multinomial logit model? Let Yin denote the unobservable utility (or cost) person n obtains from choosing alternative i. The utility (or cost) of each alternative depends on the attributes of the alternatives interacted perhaps with the attributes of the person Yin = xinTβ + ein xin : observed attributes relating to alternative i for person n β: corresponding vector of coefficients ein: error term assumed to have a Gumbel (extreme- value) distribution Major difficulty: one can only observe the outcomes of decision-making, i.e. which alternative i was chosen by a person n The choice probability given by the multinomial logit model is Pn(i)=exp[xinTβ] / {Σj exp[xjnTβ]} Introduction to discrete choice analysis

  5. A toy example Suppose you wish to choose a transport mode among several alternatives for commuting from home to work: car, bus, train, bicycle, walking Two attributes: each alternative will come with a travel expense and time. We’ll use dimension 1 of our feature vectors for time (in minutes) and dimension 2 for money (in £) Data collection: Commuter 1: taking the bus takes 20 minutes and costs £2 (i.e. x=[20,2]T) Commuter 2: driving takes 10 minutes and costs £3 (i.e. x=[10, 3]T) Commuter 3: walking takes 30 minutes, and has no cost (i.e. x=[30,0]T) …… Commuter N: taking the bus takes 45 minutes and costs £5 (i.e. x=[45, 5]T) The model parameters are the two coefficients   β =[β1, β2]T Introduction to discrete choice analysis

  6. A toy example (continued) Given the sample consisting of n commuters with the choices of transport mode and the travel costs, the choice probabilities are Pn(i)=exp[xinTβ] / {Σj exp[xjnTβ]} n=1,…,N The vector of the model parameters β =[β1, β2]T can be estimated The model can be further extended to include commuters’ social economic factors (age, gender, income, etc.) : x = [journey time, cost, age, gender, income]T The impact of these factors on the choice can be assessed e.g. how likely a commuter will choose the alternative of taking a bus if she is 45 with a salary of £32,000 pa and each journey costs 32 minutes and £4.5? Introduction to discrete choice analysis

  7. Introduction to discrete choice analysis • Context Discrete choice analysis can be investigated in various contexts. Consider several travellers who wish to minimize their travel costs • Notation • Cn denotes the feasible choice set of each individual n • Yin denotes the random travel cost for traveller n when choosing alternative i • We assume the random costs are independent of each other • Theory of individual choice behaviour The probability that any alternative i in Cn is chosen by traveler n is Pn(i) = Pr{Yin < Yjn for all j in Cn } = Pr{Yin < min(Yjn) for ij}

  8. The underlying assumptions for the logit choice model We focus on the most widely used discrete choice model, multinomial logit choice model. In the derivation of the multinomial logit choice model, there are three underlying assumptions (McFadden, 1978; Ben-Akiva and Lerman, 1985; Train, 2003; Bhat et al., 2008; Koppelman, 2008), i.e. the random variables of interest are assumed to be independent of each other (assumption I) to have equal variability across cases (assumption II) to follow the Gumbel distribution (assumption III) Extensions of the multinomial logit model may be classified into two categories: open-form and closed-form. We mainly focus on the closed-form choice models Assumptions for the logit model

  9. Assumptions for the logit model • Existing researches for the closed-form logitmodel • Relaxation of assumption I to allow dependence or correlation • The nested logit model and generalized extreme value (GEV) family (McFadden, 1978) • More recent development: paired combinatorial logit (PCL), cross-nested logit (CNL), and generalized nested logit (GNL) • Relaxation of assumption II to allow unequality of the variance • HMNL: the heteroscedastic multinomial logit model allows the random error variances to be non-identical across individuals/cases (Swait and Adamowicz, 1996) • COVNL: the covariance heterogeneous nested logit model was developed on the basis of the nested logit model and it allows heterogeneity across cases in the covariance of nested alternatives (Bhat, 1997)

  10. Assumptions for the logit model • The research question • Relax assumption III on the underlying distribution, the Gumbel distribution • Practical motivations: Logit choice model is used in a variety of the problems in economics, business, politics, transport research, etc. It is hard to believe that a single statistical distribution (the Gumbel) can accommodate such a variety of applications • Theoretical motivations: • Castillo et al. (2008) have proposed using the Weibull distribution as an alternative to the Gumbel distribution • Fosgerau and Bierlaire (2009) show that the assumption of the Weibull distribution is associated with the discrete choice model having multiplicative error terms • Research question: Are there any other distributions?

  11. A new distribution class Ordinary logit choice model • Assumed distribution: Gumbel distribution Fin(t)=1 exp{exp[(t in )/]} • Equal variability assumption the variance retains constant across all i and n • Closed under the min-operation If Yjn are independent of each other and all follow the Gambel, then min{Yjn} also does New choice model • Assumed distribution: • Fin(t)=Pr{ Yin < t}= 1  [1F(t)]αin • where the base function F(t) can be any CDF • Unequal variability assumption • the variance varies across different cases • Closed under the min-operation • If Yjn are independent of each other and all follow a distribution from the above distribution class, then min{Yjn} also does

  12. A new distribution class • The new class of distributions Fin(t)=Pr{ Yin < t }= 1 [1F(t)]αin • This distribution class includes both the Gumbel and Weibull distributions as its special cases, as well as many others such as • Pareto • Gompertz • Expoenetial • Rayleigh • generalised logistic • The total number of distributions in this class is infinite

  13. A new distribution class • The parametric approach Fin(t)=Pr{ Yin < t }= 1 [1F(t)]αin • Have knowledge of the random variables a priori • Specify a base function F(t) in the stage of modelling • The statistical inference focuses on several unknown parameters • A semiparametric approach • Have little knowledge of the distribution of the random variables • Do not specify a base function F(t) in the stage of modelling • The statistical inference includes both the unknown parameters AND the unknown base function • From a practical perspective, the assumption that the random travel costs Fin(t) follow any distribution from the distribution family with an unspecified base function F(t) allows researchers great flexibility to accommodate different problems

  14. Semiparametric discrete model • Choice probability • We suppose that the expectations EYin =Vin are linked to a linear function of a q-vector of attributes that influences specific discrete outcomes: Vin = xinTβ (termed combined attribute) • Note that min{Yjn} follows the same distribution as Yin • It can be shown that the choice probability is Pn(i) = Pr{Yin < Yjn for all j in Cn } = Pr{Yin < min(Yjn) for ij} = exp[S(xinTβ)] / {Σj exp[S(xjnTβ)]}

  15. Semiparametric discrete model • Sensitivity function S(.) • S(.) reflects how sensitive a traveller is to the one unit change in the combined attribute (including travel time, travel expense, etc.) • Mathematically, it can be shown that S(.) is determined by the base function F(t) • When S(t)=θt, the model reduces to the logit model An illustration of sensitivity function S(.)

  16. Model estimation • The parametric model • If the base function is specified in the stage of modelling, it is required to estimate the coefficients of the attributes, β • The estimation can be done similar to the logit model • The semiparametric model • Since the base function is not specified in the stage of modelling, it is required to estimate the coefficients of the attributes β and the sensitivity function S(.)

  17. Model estimation • Use P-splines to approximate S(.) S(t) ΣjwjBj(t), where Bj(t) (j=1,…,m) are known basis functions (cubic splines) and w=[w1, …,wm ]T is a vector of weights to be estimated • The accuracy of the approximation is guaranteed as m is large • Choose roughness penalty (see, e.g. Eilers and Marx, 1996): wTKw/(22) where K is a pre-specified matrix An illustration of basis functions

  18. Model estimation An illustration of P-splines

  19. Model estimation • Bayesian analysis • Data: • Let zin be 1 if traveller n chose alternative i and 0 otherwise • Let xjn be the vector of observed attributes of traveller n and alternative i • Let X and Z denote the data matrices comprising xjn and zin • Likelihood: L(Z; β, w, X) = ΠnΠi [Pn(i)]zin • Choose prior distributions (non-informative): • p(β)  1 • p(w | )  exp{wTKw/(22) } • prior of 2 , p(), is taken as an inverse gamma distribution • Posterior distribution: p(β, w, | Z, X)  L(Z; β, w, X) p(β) p(w | ) p()

  20. An empirical study • Data • Stated preference data taken from a large-scale Danish value-of-time study (Fosgerau et al., 2006) including: (a) 100 different travellers; (b) two train-related alternatives • The attributes included travel expense, number of interchanges, and travel time • Travel time was broken down into four components: (a) access/egress time (other modes than public transport, including walking, cycling, etc.); (b) in-vehicle time; (c) headway of the first used mode; and (d) interchange waiting time • In total six attributes were used in the analysis and the travellers’ time values were inferred from binary alternative routes characterised by these attributes

  21. An empirical study • Models used in the analyses Let x1, …, x6 represent the six attributes: access-egress time, headway, in-vehicle-time, waiting time, number of interchanges, and travel expenses. Following Fosgerau and Bierlaire (2009), the coefficient of travel expense was normalized to unit so that other coefficients can be interpreted as willingness-to-pay indicators • the ordinary multinomial logit model: linear sensitivity function S(xTβ) = θ (β1x1+ …+ β6x6 ) • the multiplicative choice model: log sensitivity function S(xTβ) = θ log(β1x1+ …+ β6x6 ) • the semiparametric model S(xTβ) = S(u+v((β1x1+ …+ β6x6 )) where u and v has two scaling parameters so that S(.) is on [0, 1]

  22. An empirical study • Settings in the computation • The splines used in the following analyses included seven cubic basis functions (j=1,…,7) on the support [0, 1] • The total number of iterations in the MCMC simulation was set as 10,000. The first 5,000 iterations were considered as burnt-in period and the corresponding draws were discarded. The results are reported below using the remaining 5,000 draws

  23. Case I: the train data

  24. case I: the train data • The middle part of obtained sensitivity function is not sensitive to the change of the combined attribute • Towards to the both extreme ends of the support, it increases (or decreases) rapidly • Each unit increment in the combined attribute does not impact on the train users equally

  25. Case II: the bus data

  26. Case II: the bus data • The obtained sensitivity function is quite close to a linear function. • The semiparametric model produced similar estimates to that of the ordinary multinomial logit model • Due to its simplicity, it seems that the ordinary multinomial logit model is a sensible choice

  27. Discussion and conclusions • Relaxation of assumption III • The assumption of underlying distributions is extended from the Gumbel to a much wider distribution class • It also retains a crucial property in discrete decision analysis, i.e., it is closed under the minimum operation • It allows unequal variances across cases • Semiparametric choice model and sensitivity function • In the modeling stage the distribution needs not to be specified • A semiparametric choice model is derived that links travel-related attributes to the choice probabilities via a sensitivity function • When the sensitivity function is nonlinear, travellers’ response to the combined attribute does not change in a proportionate manner. This has important practical implications for policy makers

More Related