1 / 21

Maximum Likelihood Estimation

Maximum Likelihood Estimation. Methods of Economic Investigation Lecture 17. Last Time. IV estimation Issues Heterogeneous Treatment Effects The assumptions LATE interpretation Weak Instruments Bias in Finite Samples F-statistics test. Today’s Class. Maximum Likelihood Estimators

haines
Download Presentation

Maximum Likelihood Estimation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17

  2. Last Time • IV estimation Issues • Heterogeneous Treatment Effects • The assumptions • LATE interpretation • Weak Instruments • Bias in Finite Samples • F-statistics test

  3. Today’s Class • Maximum Likelihood Estimators • You’ve seen this in the context of OLS • Can make other assumptions on the form of likelihood function • This is how we estimate discrete choice models like probit and logit • This is a very useful form of estimation • Has nice properties • Can be very robust to mis-specification

  4. Our Standard OLS • Standard OLS Yi = Xi’β + εi • Focus on minimizing mean squared error with an assumption that εi|Xi ~ N(0, σ2)

  5. Another way to motivate linear models • “Extremum Estimators”: maximize/minimize some function • OLS Minimize Mean-Squared Error • Could also imagine minimizing some other types of functions • We often use a “likelihood function” • This approach is more general, allowing us to deal with more complex nonlinear models • Useful properties in terms of consistency and asymptotic convergence

  6. What is a likelihood function • Suppose we have independent and identically distributed random variables {Zi, . . . ,ZN} drawn from a density function f(z; θ). Then the likelihood function given a sample • Because it is sometimes convenient, we often use this in logarithmic form

  7. Consistency - 1 • Consider the population likelihood function with the “true” parameter θ0 • Think of L0as the population average and log L as the sample estimate, so that in the usual way

  8. Consistency - 2 • The population likelihood function is maximized L0(θ) at the true value, θ0 . Why? • think of the sample likelihood function as telling us how likely it is one would observe the sample if theparameter value θ is really the true parameter value. • Similarly, the population likelihood function L0(θ) will be the largest at the value of θthat makes it most likely to “observe the population” • That value is true parameter value. ie θ0 = argmaxL0(θ).

  9. Consistency - 3 • We now know that the population likelihood L0(θ) is maximized at θ0 • Can use Jensen’s inequality to apply this to the log function • the sample likelihood function log L(θ; z) gets closer to L0(θ) as N increases • i.e. log(L) will start having the same shape as L0 • For large N, the sample likelihood will be maximized at θ0

  10. Information Matrix Equality • An additional useful property from the MLE comes from: • Define the score function as the vector of derivatives of the log likelihood function • Define the Hessian as the matrix of second derivatives of the log likelihood function

  11. Asymptotic Distribution • Define the following: • Then the MLE estimate will converge in distribution to: • Where the information matrix I(θ) has the property that i.e. there does not exist a consistent estimate of θ with a smaller variance

  12. Computation • Can be quite complex because need to numerically maximize • General procedure • Re-scale variables so they have roughly similar variances • Choose some starting value and estimated maximum in that areas • do this over and over across different grids • Get an approximation of the underlying objective function • If this converges to a single maximum—you’re done

  13. Test Statistics • Define our likelihood function L(z;θ0,θ1) • Suppose we want to test H0: θ0 = 0 against the alternative HA: θ0 ≠ 0 • We could estimate a restricted and an unrestricted likelihood function

  14. Test Statistics - 1 • We can test how “close” our restricted and unrestricted models might be • We could test if the restricted log likelihood function is maximized at θ0 = 0, the derivative of the log likelihood function with respect to 0 at that point should be close to zero.

  15. Test Statistics - 2 • The restricted and unrestricted estimates of θ should be close together if the null hypothesis is correct • Partition the information matrix as follows • Define the Wald Test as:

  16. Comparing test statistics • In large samples, these test statistics should converge in probability • In finite samples, the three will tend to generate somewhat different test statistics, • Will generally come to the same conclusion • The difference between the tests is how they go about answering that question. • The LR test requires estimates of both of the models • The W and LM tests approximate the LR test but require that only one model be estimated. • When model is linear the three test statistics have the following relationship W ≥ LR ≥ LM

  17. OLS in the MLE context • Linear Model log Likelihood Function • Choose parameter values which maximize this:

  18. Example 1: Discrete choice • Latent Variable Model: • True variable of interest is: Y*= X’β + ε • We don’t observe Y*but we can observe Y = 1[Y*>0] • Pr[Y=1] = Pr[Y*>0] = Pr[ε<X’β] • What to assume about ε? • Linear Probability Model: Pr[Y=1] = X’β • Probit Model: Pr[Y=1] = Ф(X’β) • Logit Model: Pr[Y=1] = exp(X’β)/ [1 + exp(X’β)]

  19. Likelihood Functions • Probit • Logit

  20. Marginal Effects • In the linear function we can interpret our coefficients as the change in the likelihood function with respect to the relevant variable, i.e. • In non-linear functions, things are a bit trickier. We get • We get the parameter estimate of β • But we want: • These are the “marginal effects” and are typically evaluated at the mean values of X

  21. Next Time • Time Series Processes • AR • MA • ARMA • Model Selection • Return to MLE • Various Criterion for Model Choice

More Related