540 likes | 838 Views
The Generalised Method of Moments. Ibrahim Stevens Joint HKIMR/CCBS Workshop Advanced Modelling for Monetary Policy in the Asia-Pacific Region May 2004. GMM. Why use GMM? Nonlinear estimation Structural estimation ‘Robust’ estimation Models estimated using GMM Many….
E N D
The Generalised Method of Moments Ibrahim Stevens Joint HKIMR/CCBS Workshop Advanced Modelling for Monetary Policy in the Asia-Pacific Region May 2004
GMM • Why use GMM? • Nonlinear estimation • Structural estimation • ‘Robust’ estimation • Models estimated using GMM • Many…. • Rational expectations models • Euler Equations • Non-Gaussian distributed models
The Method of Moments • Simple moment conditions PopulationSample
The Method of Moments • OLS as a MM estimator • Moment conditions: • MM estimator
Slightly more Generalised MM • IV is a MM estimator • Moment condition: • MM estimator:
Slightly more Generalised MM • In the previous IV estimator we have considered the case where the number of instruments is equal to the number of coefficients we want to estimate • Size of Z is the same as the size of IV • What happens if the number of instruments is greater than the number of coefficients? • Essentially, the number of equations is greater than the number of coefficients you want to estimate: model is over-identified
IV with more constraints than equations • Maintain the moment condition as before • Variance of moment condition is: • Minimise ‘weighted’ distance:
IV with more constraints than equations • Why do we do a minimisation exercise? • Because we have more equations than ‘unknows’. • How do we determine the true values of the coefficients? • Solution is to minimise the previous expression so that the coefficients are able to approximate the moment condition, that is pick coefficients such that the orthogonality condition is satisfied
IV with more constraints than equations • First order conditions: • MM estimator (looks like an IV estimator with more instruments than parameters to estimate):
Moment conditions in estimation • Model may be nonlinear • Euler equations often imply models in levels not logs (consumption, output, other first order conditions) • Both ad hoc and structural models may be nonlinear in parameters of interest (systems) • Models may have unknown disturbance structure • Rational expectations • May not be interested in related parameters
A generalised problem • Let any (nonlinear) moment condition be: • Sample counterpart: • Minimise:
A generalised problem • If we have more instruments (n) than coefficients (p) we choose to minimise: • What should the matrix W look like?
A generalised problem • It turns out that any symmetric positive definite matrix of W yields consistent estimates for the parameters • However, it does not yield efficient ones • Hansen (1982) derives the necessary (not sufficient) condition to obtain asymptotically efficient estimates for the coefficients
Choice of W (efficiency) • Appropriate weight matrices (Hansen, 82): • Intuition: W-1 denotes the inverse of the covariance matrix of the sample moments. This matrix is chosen because it ‘means’ that less weight is placed on the more imprecise moments
Implementation • Implementation is generally undertaken in a ‘two-step procedure’: • Any symmetric positive definite matrix yields consistent estimates of the parameters. Thus exploit this. Using ‘any’ symmetric positive definite matrix, back up estimates for the parameters in the model • An arbitrary matrix such the identity matrix is normally used to obtain the first consistent estimator • Using these parameters construct the weighting matrix W and from that we can undertake the minimisation problem • This process can be iterated • Some computational cost
Instrument validity and W • Estimation of the minimised criterion can be used to test the validity of the instruments • EViews gives you the ‘wrong’ Hansen J-statistic - test of overidentification • Multiply by the number of observations to get correct J • This is a Chi squared with n-p degrees of freedom • If a sub-optimal weighting matrix is used, Hansen’s J-test does not apply. See Chochrane 1996 • We can also test as sub-set of othogonality conditions
Covariance estimators • Choosing the right weighting matrix is important for GMM estimation • There have been many econometric papers written on this subject • Estimation results can be sensitive to the choice of weighting matrix
Covariance estimators • So far we have not considered the possibility that heteroskedasticity and autocorrelation be a part of your model • How can we account for this? • We need to modify the covariance matrix
Covariance estimators • Write our covariance matrix of empirical moments as: • Where Mq is the qth row of the Txn matrix of sample moments
Covariance estimators • Define the autocovariances: • Express W in terms of the above expressions:
Covariance estimators • If there is no serial correlation, the expressions for j0 are all equal to zero (since the autocovariances will be zero): • Note that this ‘looks like’ a White (1980) heteroskesdastic consistent estimator…
Covariance estimators • If this looks like a White (1980) heteroskesdastic consistent estimator… …implementation should be straight-forward! • Example (Remembering White): Take the standard heteroskedastic version of the linear model
Covariance estimators • The appropriate problem and weighting matrix are • The weighting matrix can be consistently estimated by using any consistent estimator of the model’s parameters and substituting the expected value of the squared residuals by the actual residual (NB. The only difference here is that we are generalising the problem by allowing instruments, ie Zs)
Covariance estimators • The problem is that with autocorrelation it is not possible to replace the expected values of the squared residuals by the actual values from the first estimation • It would lead to an inconsistent estimate of the autocovariance matrix of order j • The problem of this approach is that, asymptotically, the number of estimated autocovariances grows at the same rate as the sample size • Thus whilst unbiased W is not consistent in the mean squared error sense
Covariance estimators • Thus we require a class of estimators that circumvents these problems • A class of estimators that prevent the autocovariances from growing with the sample size are • Parzen termed the ws’ the lag window • These estimators correspond to a class of kernel (spectral density) estimators (evaluated at frequency zero)
Covariance estimators • The key is to choose the sequence of ws’ such that the sequence of weights approaches unity rapidly enough to obtain asymptotic unbiasedness but slowly enough to ensure that the variance converges to zero • The type of weights you will find in EViews correspond to a particular class of lagged windows termed scale parameter windows • The lag window is expressed as
Covariance estimators • HAC matrix estimation: • k(j/bT) is a kernel, bT is the bandwidth • Intuition: bT streches or contracts the distribution; it acts as a scaling parameter • k(z) is referred to as the ‘lagged window generator’
Covariance estimators • HAC matrix estimation: • When the value of the kernel is zero for z>1, bT is called a ‘lag truncation parameter’ (autocovariances corresponding to lags greater than bT are given zero weight) • The scalar bT is often referred to as the ‘bandwidth parameter’
Covariance estimators • Eviews provides two kernels: • Quadratic • Barlett • It provides 3 options for the bandwidth parameter bT (See manual for specific functional forms and good discussion!)
Covariance estimators • For instance Newey and West (1987) suggest using a Barlett: • Guarantees positive definiteness (which is something that we desire since we would like a positive variance)
Alternative covariance estimators • Andrews (1991) • Quadratic spectral estimator: where:
Pre-whitening • Andrews and Monahan (1992) • Fit an VAR to the moment residuals: where: • This is known as a pre-whitened estimate • Can be applied to any kernel
Linear models • Estimate by IV (consistent but inefficient): • Use estimates to construct estimate of W: • Can iterate on estimates of W
Nonlinear models • Estimate by nonlinear IV • May solve by standard ‘iterative’ nonlinear IV • Estimate covariance matrix • Minimise J using non-linear optimisation • Iterate on covariance matrix (optional) • Eviews uses Berndt-Hall-Hall-Hausman or Marquardt algorithms (see manual for pros and cons)
Useful facts • Covariance matrix estimators must be positive-definite, asymptotically it has been shown that the quadratic spectral window is best • But in small samples Newey and West (1994) show little difference between the Quadratic and their estimator (based on Barlett)
Useful facts • Choice of bandwidth parameter more important than the choice of the kernel • Variable Newey and West and Andrews is state of the art • HAC estimators suffer from poor small sample performance, thus test statistics (eg t-test) may not be reliable – t-stats appear to reject a true null far more often than their nominal size • Adjustments to the matrix W may be made but these depend on whether there is autocorrelation and/or heteroskesdacity
Useful facts • Numerical Optimisation – common problem of not having a global maximum/minimum • Eg Problems of local maximum/minimum or flat functions • Without a global mimimum, GMM estimation does not yield consistent and efficient estimates • Convexity of the criterion function is important – it guarantees global minima
Useful facts • For non-convex problems you must use ‘different methods’ • A multi-start algorithm popular: start at a local optimisation algorithm from initial values of the parameters to converge to a local minimum and the repeat the process a number of times with different starting values. The estimator is taken to be the parameter values corresponding to the small value of the criterion function • However it does not find the global minimum • Andrews (1997) proposes a stopping-rule procedure to overcome this problem
Useful facts • Weak instrument literature • Nelson and Startz (1990) instrumental variables estimators have poor sample properties when the instruments are weakly correlated with the explanatory variables • Chi-square tests tend to reject the null too frequently compared to its asymptotic distribution • T-ratios are too large • Hansen (1985) characterises an efficiency bound for the asymptotic covariance matrices of the alternative GMM estimators and optimal instruments that attain the bound
Useful facts • Weak instrument literature – Stock, Wright and Yogo (2002) provide an excellent summary of some of the issues related to weak instruments • Recently, some authors advocate the use of limited-information maximum likelihood techniques to compare results with GMM estimation since both asymptotically equivalent • Neely, Roy and Whiteman (2001) show that results can be very different for CAPM models • Furher and Rudebush (2003) show this to be the case in Euler equations for output • Mavroeidis (2003) finds similar results for KNPC
Useful facts • Finite sample properties of GMM estimators – similar to weak instrument literature • Tauchen (1986) and Kocherlakota (1990) examine artificial data generated from a non-linear CAPM • Using two-step GMM estimator Tauchen concluded that GMM estimators and test stats had reasonable properties in small samples • He also investigated optimal instruments finding that optimal estimators based on optimal selection of instruments often do not perform as well in small samples as GMM estimators using an arbitrary selection of instruments
Useful facts • Kocherlakota (1990) allows for multiple assets and different sets of instruments • Using iterated GMM estimators, Kocherlakota finds that GMM performs worse with larger instrument sets leading to downward biases in coefficient estimates and narrow confidence intervals. Also the J test tends to reject too often • Hansen, Heaton and Yaron (1996) consider the same methods as Tauchen together with alternative choices for W. Both the 2 stage and the iterative methods have small sample distributions that can be greatly distorted
Useful facts • Furher, Moore and Schuh (1995) compare GMM and maximum likelihood estimators in a class of nonlinear models using MonteCarlo simulations • They find that GMM estimates tend to reject their model whilst ML support it • Why? They find GMM estimates are often biased, statistically insignificant, economically implausible and dynamically unstable • They attribute the result to weak instruments
Useful facts • Nonstationarity – the data must all be nonstationary to use GMM • Thus data are differenced or cointegrated • In the case of co-integration Cooley and Ogaki (1996) suggest estimating the cointegration relationship using OLS and use these parameters for the covariance matrix W
Practical GMM • Moment conditions • Theoretical moment conditions best • Empirical moment condition - try different informational assumptions • Try ‘straight’ IV • If you know the form of autocorrelation try IV-MA • Eviews reports J/T
Practical GMM • Use Newey-West first • Try setting the lag truncation to something that is close to the autocorrelation expected • Then try T/3 • Pre-whitening • Don’t do it unless nothing else works for NW • QS-PW • ‘State of the art’ - if it works use it
Euler equations and consumption • Problem of intertemporal utility max: subject to: • Constrained problem:
Euler equations and consumption • First order conditions: • Euler equation of:
Consumption reduced form- dummy’s guide to Lucas’ critique • Income process: • Consumption function: where
Consumption moment conditions • The orthogonality (zero mean) conditions: