1 / 53

The Generalised Method of Moments

The Generalised Method of Moments. Ibrahim Stevens Joint HKIMR/CCBS Workshop Advanced Modelling for Monetary Policy in the Asia-Pacific Region May 2004. GMM. Why use GMM? Nonlinear estimation Structural estimation ‘Robust’ estimation Models estimated using GMM Many….

alpha
Download Presentation

The Generalised Method of Moments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Generalised Method of Moments Ibrahim Stevens Joint HKIMR/CCBS Workshop Advanced Modelling for Monetary Policy in the Asia-Pacific Region May 2004

  2. GMM • Why use GMM? • Nonlinear estimation • Structural estimation • ‘Robust’ estimation • Models estimated using GMM • Many…. • Rational expectations models • Euler Equations • Non-Gaussian distributed models

  3. The Method of Moments • Simple moment conditions PopulationSample

  4. The Method of Moments • OLS as a MM estimator • Moment conditions: • MM estimator

  5. Slightly more Generalised MM • IV is a MM estimator • Moment condition: • MM estimator:

  6. Slightly more Generalised MM • In the previous IV estimator we have considered the case where the number of instruments is equal to the number of coefficients we want to estimate • Size of Z is the same as the size of IV • What happens if the number of instruments is greater than the number of coefficients? • Essentially, the number of equations is greater than the number of coefficients you want to estimate: model is over-identified

  7. IV with more constraints than equations • Maintain the moment condition as before • Variance of moment condition is: • Minimise ‘weighted’ distance:

  8. IV with more constraints than equations • Why do we do a minimisation exercise? • Because we have more equations than ‘unknows’. • How do we determine the true values of the coefficients? • Solution is to minimise the previous expression so that the coefficients are able to approximate the moment condition, that is pick coefficients such that the orthogonality condition is satisfied

  9. IV with more constraints than equations • First order conditions: • MM estimator (looks like an IV estimator with more instruments than parameters to estimate):

  10. Moment conditions in estimation • Model may be nonlinear • Euler equations often imply models in levels not logs (consumption, output, other first order conditions) • Both ad hoc and structural models may be nonlinear in parameters of interest (systems) • Models may have unknown disturbance structure • Rational expectations • May not be interested in related parameters

  11. A generalised problem • Let any (nonlinear) moment condition be: • Sample counterpart: • Minimise:

  12. A generalised problem • If we have more instruments (n) than coefficients (p) we choose to minimise: • What should the matrix W look like?

  13. A generalised problem • It turns out that any symmetric positive definite matrix of W yields consistent estimates for the parameters • However, it does not yield efficient ones • Hansen (1982) derives the necessary (not sufficient) condition to obtain asymptotically efficient estimates for the coefficients

  14. Choice of W (efficiency) • Appropriate weight matrices (Hansen, 82): • Intuition: W-1 denotes the inverse of the covariance matrix of the sample moments. This matrix is chosen because it ‘means’ that less weight is placed on the more imprecise moments

  15. Implementation • Implementation is generally undertaken in a ‘two-step procedure’: • Any symmetric positive definite matrix yields consistent estimates of the parameters. Thus exploit this. Using ‘any’ symmetric positive definite matrix, back up estimates for the parameters in the model • An arbitrary matrix such the identity matrix is normally used to obtain the first consistent estimator • Using these parameters construct the weighting matrix W and from that we can undertake the minimisation problem • This process can be iterated • Some computational cost

  16. Instrument validity and W • Estimation of the minimised criterion can be used to test the validity of the instruments • EViews gives you the ‘wrong’ Hansen J-statistic - test of overidentification • Multiply by the number of observations to get correct J • This is a Chi squared with n-p degrees of freedom • If a sub-optimal weighting matrix is used, Hansen’s J-test does not apply. See Chochrane 1996 • We can also test as sub-set of othogonality conditions

  17. Covariance estimators • Choosing the right weighting matrix is important for GMM estimation • There have been many econometric papers written on this subject • Estimation results can be sensitive to the choice of weighting matrix

  18. Covariance estimators • So far we have not considered the possibility that heteroskedasticity and autocorrelation be a part of your model • How can we account for this? • We need to modify the covariance matrix

  19. Covariance estimators • Write our covariance matrix of empirical moments as: • Where Mq is the qth row of the Txn matrix of sample moments

  20. Covariance estimators • Define the autocovariances: • Express W in terms of the above expressions:

  21. Covariance estimators • If there is no serial correlation, the expressions for j0 are all equal to zero (since the autocovariances will be zero): • Note that this ‘looks like’ a White (1980) heteroskesdastic consistent estimator…

  22. Covariance estimators • If this looks like a White (1980) heteroskesdastic consistent estimator… …implementation should be straight-forward! • Example (Remembering White): Take the standard heteroskedastic version of the linear model

  23. Covariance estimators • The appropriate problem and weighting matrix are • The weighting matrix can be consistently estimated by using any consistent estimator of the model’s parameters and substituting the expected value of the squared residuals by the actual residual (NB. The only difference here is that we are generalising the problem by allowing instruments, ie Zs)

  24. Covariance estimators • The problem is that with autocorrelation it is not possible to replace the expected values of the squared residuals by the actual values from the first estimation • It would lead to an inconsistent estimate of the autocovariance matrix of order j • The problem of this approach is that, asymptotically, the number of estimated autocovariances grows at the same rate as the sample size • Thus whilst unbiased W is not consistent in the mean squared error sense

  25. Covariance estimators • Thus we require a class of estimators that circumvents these problems • A class of estimators that prevent the autocovariances from growing with the sample size are • Parzen termed the ws’ the lag window • These estimators correspond to a class of kernel (spectral density) estimators (evaluated at frequency zero)

  26. Covariance estimators • The key is to choose the sequence of ws’ such that the sequence of weights approaches unity rapidly enough to obtain asymptotic unbiasedness but slowly enough to ensure that the variance converges to zero • The type of weights you will find in EViews correspond to a particular class of lagged windows termed scale parameter windows • The lag window is expressed as

  27. Covariance estimators • HAC matrix estimation: • k(j/bT) is a kernel, bT is the bandwidth • Intuition: bT streches or contracts the distribution; it acts as a scaling parameter • k(z) is referred to as the ‘lagged window generator’

  28. Covariance estimators • HAC matrix estimation: • When the value of the kernel is zero for z>1, bT is called a ‘lag truncation parameter’ (autocovariances corresponding to lags greater than bT are given zero weight) • The scalar bT is often referred to as the ‘bandwidth parameter’

  29. Covariance estimators • Eviews provides two kernels: • Quadratic • Barlett • It provides 3 options for the bandwidth parameter bT (See manual for specific functional forms and good discussion!)

  30. Covariance estimators • For instance Newey and West (1987) suggest using a Barlett: • Guarantees positive definiteness (which is something that we desire since we would like a positive variance)

  31. Alternative covariance estimators • Andrews (1991) • Quadratic spectral estimator: where:

  32. Pre-whitening • Andrews and Monahan (1992) • Fit an VAR to the moment residuals: where: • This is known as a pre-whitened estimate • Can be applied to any kernel

  33. Linear models • Estimate by IV (consistent but inefficient): • Use estimates to construct estimate of W: • Can iterate on estimates of W

  34. Nonlinear models • Estimate by nonlinear IV • May solve by standard ‘iterative’ nonlinear IV • Estimate covariance matrix • Minimise J using non-linear optimisation • Iterate on covariance matrix (optional) • Eviews uses Berndt-Hall-Hall-Hausman or Marquardt algorithms (see manual for pros and cons)

  35. Useful facts • Covariance matrix estimators must be positive-definite, asymptotically it has been shown that the quadratic spectral window is best • But in small samples Newey and West (1994) show little difference between the Quadratic and their estimator (based on Barlett)

  36. Useful facts • Choice of bandwidth parameter more important than the choice of the kernel • Variable Newey and West and Andrews is state of the art • HAC estimators suffer from poor small sample performance, thus test statistics (eg t-test) may not be reliable – t-stats appear to reject a true null far more often than their nominal size • Adjustments to the matrix W may be made but these depend on whether there is autocorrelation and/or heteroskesdacity

  37. Useful facts • Numerical Optimisation – common problem of not having a global maximum/minimum • Eg Problems of local maximum/minimum or flat functions • Without a global mimimum, GMM estimation does not yield consistent and efficient estimates • Convexity of the criterion function is important – it guarantees global minima

  38. Useful facts • For non-convex problems you must use ‘different methods’ • A multi-start algorithm popular: start at a local optimisation algorithm from initial values of the parameters to converge to a local minimum and the repeat the process a number of times with different starting values. The estimator is taken to be the parameter values corresponding to the small value of the criterion function • However it does not find the global minimum • Andrews (1997) proposes a stopping-rule procedure to overcome this problem

  39. Useful facts • Weak instrument literature • Nelson and Startz (1990) instrumental variables estimators have poor sample properties when the instruments are weakly correlated with the explanatory variables • Chi-square tests tend to reject the null too frequently compared to its asymptotic distribution • T-ratios are too large • Hansen (1985) characterises an efficiency bound for the asymptotic covariance matrices of the alternative GMM estimators and optimal instruments that attain the bound

  40. Useful facts • Weak instrument literature – Stock, Wright and Yogo (2002) provide an excellent summary of some of the issues related to weak instruments • Recently, some authors advocate the use of limited-information maximum likelihood techniques to compare results with GMM estimation since both asymptotically equivalent • Neely, Roy and Whiteman (2001) show that results can be very different for CAPM models • Furher and Rudebush (2003) show this to be the case in Euler equations for output • Mavroeidis (2003) finds similar results for KNPC

  41. Useful facts • Finite sample properties of GMM estimators – similar to weak instrument literature • Tauchen (1986) and Kocherlakota (1990) examine artificial data generated from a non-linear CAPM • Using two-step GMM estimator Tauchen concluded that GMM estimators and test stats had reasonable properties in small samples • He also investigated optimal instruments finding that optimal estimators based on optimal selection of instruments often do not perform as well in small samples as GMM estimators using an arbitrary selection of instruments

  42. Useful facts • Kocherlakota (1990) allows for multiple assets and different sets of instruments • Using iterated GMM estimators, Kocherlakota finds that GMM performs worse with larger instrument sets leading to downward biases in coefficient estimates and narrow confidence intervals. Also the J test tends to reject too often • Hansen, Heaton and Yaron (1996) consider the same methods as Tauchen together with alternative choices for W. Both the 2 stage and the iterative methods have small sample distributions that can be greatly distorted

  43. Useful facts • Furher, Moore and Schuh (1995) compare GMM and maximum likelihood estimators in a class of nonlinear models using MonteCarlo simulations • They find that GMM estimates tend to reject their model whilst ML support it • Why? They find GMM estimates are often biased, statistically insignificant, economically implausible and dynamically unstable • They attribute the result to weak instruments

  44. Useful facts • Nonstationarity – the data must all be nonstationary to use GMM • Thus data are differenced or cointegrated • In the case of co-integration Cooley and Ogaki (1996) suggest estimating the cointegration relationship using OLS and use these parameters for the covariance matrix W

  45. Practical GMM • Moment conditions • Theoretical moment conditions best • Empirical moment condition - try different informational assumptions • Try ‘straight’ IV • If you know the form of autocorrelation try IV-MA • Eviews reports J/T

  46. Practical GMM • Use Newey-West first • Try setting the lag truncation to something that is close to the autocorrelation expected • Then try T/3 • Pre-whitening • Don’t do it unless nothing else works for NW • QS-PW • ‘State of the art’ - if it works use it

  47. Euler equations and consumption • Problem of intertemporal utility max: subject to: • Constrained problem:

  48. Euler equations and consumption • First order conditions: • Euler equation of:

  49. Consumption reduced form- dummy’s guide to Lucas’ critique • Income process: • Consumption function: where

  50. Consumption moment conditions • The orthogonality (zero mean) conditions:

More Related