1 / 15

Econometric Analysis

Econometric Analysis. Week 9 Limited dependent variables models. Binary choices and Limited Dependent Variables models The linear probability model logit and probit models Examples and results using PcGive

xenon
Download Presentation

Econometric Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Econometric Analysis Week 9 Limited dependent variables models

  2. Binary choices and Limited Dependent Variables models • The linear probability model • logit and probit models • Examples and results using PcGive • Further comments on: censored and truncated regression models (Tobit analysis) and sample selection bias, multinomial choice models Lecture outline

  3. Wooldridge, J M (2006) Introductory Econometrics. A Modern Approach. (Third Edition) Chapter 17 pp 582-595 • Greene, W H (2000) Econometric Analysis (Fourth Edition) Chapter 19 pp 811-837 • Kennedy, P (2003) A Guide to Econometrics. (Fifth Edition) Chapters 15 & 16 • Dougherty, C (2007) Introduction to Econometrics (Third Edition) Chapter 10 • Spector, L C and Mazzeo, M (1980) Probit analysis and Economic Education. Journal of Economic Education Spring, pp 37-44 • Doornik, J A and Hendry, D F (2006) Empirical Econometric Modelling PcGive Vol III, Chapters 5 & 6. References and recommended reading

  4. on occasions the variable that we are trying to explain may be discrete rather than continuous • in the most basic case it is a binary, dichotomous, dummy or qualitative variable – in other words it can take only one of two values – 0 or 1 • examples: (1) in employment/out of employment (2) university educated/ not university educated (3) pass test/fail test (4) owns home/does not own home • we might wish to explain how observations fall into each category – for example in the labour market case by linking the dependent variable to explanatory variables like age, education, marital status etc. • simple OLS regression will not really be appropriate here – although an early approach was the linear probability model which is based on OLS regression • today you are more likely to use either the logit (sometimes called logistic) or probit models, which make use respectively of the logistic distribution or the cumulative normal distribution to provide an S shaped curve linking the two sets of points • more advanced work can extend the number of values that the limited dependent variable can take beyond two – for example the five categories on a Likert scale – so called multinomial choice variables - we won’t cover these on this unit Basics

  5. the Linear Probability Model (LPM) • consider the simple case with one explanatory variable X • in this model the predicted Y value denotes the probability that the dependent variable takes a value of 1 - so the probability of “success” (Y=1) is linearly related to the explanatory variable X • the Y values can only be 0 or 1 so a straight line fit through the points, as shown in figure 1, will result in predicted Y values outside the range 0-1 • the residuals will also be heteroskedastic – so if we do use OLS we should use robust standard errors to calculate t values • R squared has no meaning here – whereas in the continuous OLS case it is possible for all points to lie on the regression line, here they cannot as they must lie along one of the horizontal lines at 0 and 1

  6. Binary Response Models – logit and probit models These models make use of a “squash function” G to ensure that the fitted values lie strictly between 0 and 1 • Logit Model: G follows a logistic distribution • Probit Model: G follows a cumulative normal distribution (see Wooldridge or Greene for the full algebraic details) The models intrinsically non-linear and so they are estimated using Maximum Likelihood procedures

  7. The shape of the logit and probit curves

  8. Partial responses in binary response models • whereas in the LPM the marginal or partial effect of change in one of the Xs on Y (=∂Y/∂Xj) is constant, for binary response models it will vary over the curve – I will give a detailed derivation for the logistic function later • it is sometimes given in results tables as the “slope” - calculated at the mean values of the X variables

  9. Goodness of fit in binary response models • You sometimes see count R2 – which counts the proportion of cases correctly predicted – this is not very helpful, particularly if the split between 0 and 1 values for Y in the sample is very uneven, where even a naïve model of predicting a success for every case would come out well. • An alternative measure called pseudo R-squared given. This is calculated as 1 – Lur/L0where Lur= log-likelihood for the estimated model and L0 = log-likelihood for a model with an intercept only (see Wooldridge pp589-590 and Kennedy p267)

  10. More detail on the logit model • Let’s look at a simple case with just one explanatory variable • the fitted values are kept between the limits of 0 and 1 • if we write then Y1 as Z   and Y 0 as Z  0

  11. Yet more on the logit model • For this function the partial response of Y to a change in X1 turns out to be 1Y(1-Y) see proof on separate sheet • The model is sometimes reformulated as the log-odds model with see derivation on separate sheet

  12. Greene (2000) illustrates the use of logit and probit models with a data set from Spector and Mazzeo (1980) which concerns the effectiveness of a new method of teaching economics. The Spector and Mazzeo data has information on the performance of 32 students on the principles in macroeconomics courses at Iowa University in the spring semesters of 1974 and 1975. • The dependent variable “GRADE” is an indicator of whether or not students passed a test in principles of macroeconomics • The independent variables are: • GPA – the student’s Grade Point Average prior to taking the course • TUCE – the result on a pre-entry Test of Understanding in College Economics • PSI – an indicator of whether or not the student was taught using the new Personalised System of Instruction rather than just in lectures Example

  13. Model formulation in PcGive (1) Category: Models for discrete data Model class: Binary Discrete Choice using PcGive

  14. Model formulation in PcGive (2)

  15. Model formulation in PcGive (3) Now choose the model – logit or probit

More Related