470 likes | 475 Views
This article reviews and provides testable implications for static asset pricing models, including CAPM, APT, and FF. It discusses the econometric methodology used to test these models and reviews traditional tests such as Maximum Likelihood Estimation, Wald Test, F Test, and LM Test. The article also explores the concept of concentrated log likelihood and likelihood ratio test.
E N D
Tests of Static Asset Pricing Models • In general asset pricing models quantify the tradeoff between risk and expected return. • Need to both measure risk and relate it to the expected return on a risky asset. • The most commonly used models are: • CAPM • APT • FF three factor model
Testable Implications • These models have testable implications. For the CAPM, for example: • Expected excess return of a risky asset is proportional to the covariance of its return and that of the market portfolio. • Note, this tells us the measure of risk used and its relation to expected return. • There are other restrictions that depend upon whether there exists a riskless asset.
Testable Implications • For the APT, • The expected excess return on a risky asset is linearly related to the covariance of its return with various risk factors. • These risk factors are left unspecified by the theory and have been: • Derived from the data (CR (1983), CK) • Exogenously imposed (CRR (1985))
Plan • Review the basic econometric methodology we will use to test these models. • Review the CAPM. • Test the CAPM. • Traditional tests (FM (1972), BJS (1972), Ferson and Harvey) • ML tests (Gibbons (1982), GRS (1989)) • GMM tests • Factor models: APT and FF • Curve fitting vs. ad-hoc theorizing
Econometric Methodology Review • Maximum Likelihood Estimation • The Wald Test • The F Test • The LM Test • A specialization to linear models and linear restrictions • A comparison of test statistics
Review of Maximum Likelihood Estimation • Let {x1, … xT} be a sample of T, i.i.d. random variables. • Call that vector x. • Let x be continuously distributed with density f(x|). • Where, is the unknown parameter vector that determines the distribution.
The Likelihood Function • The joint density for the independent random variables is given by: f(x1|) f(x2|) f(x3|)… f(xT|) • This joint density is known as the likelihood function, L(x|) L(x|)= f(x1|) f(x2|) f(x3|)… f(xT|) • Can you write the joint density and L(x|) this way when dealing with time-dependent observations?
Independence • You can’t. • The reason you can write the product f(x1|) f(x2|) f(x3|)… f(xT|) is because of the independence. • If you have dependence, writing the joint density can be extremely complicated. • See, e.g. Hamilton (1994) for a good discussion of switching regression models and the EM algorithm.
Idea Behind Maximum Likelihood Estimation • Pick the parameter vector estimate, , that maximizes the likelihood, L(x|), of observing the particular vector of realizations, x.
MLE Plusses and Minuses • Plusses: Efficient estimation in terms of picking the estimator with the smallest covariance matrix. • Question: are ML estimators necessarily unbiased? • Minuses: Strong distributional assumptions make robustness a problem.
MLE Example: Normal Distributions where OLS assumptions are satisfied • Sample y of size T is normally distributed with mean x where • X is a T x K matrix of explanatory variables • is a K x 1 vector of parameters • The variance-covariance matrix of the errors from the true regression is 2I, where • I is a T x T identity matrix
The Likelihood Function • The likelihood function for the linear model with independent normally distributed errors is:
The Log-Likelihood Function • With independent draws, it is easier to maximize the log-likelihood function, because products are replaced by sums. The log-likelihood is given by:
The Information Matrix • If is our parameter vector, • I() is the information matrix, • which is minus the expectation of the matrix of second partial derivatives of the log-likelihood with respect to the parameters.
The Information Matrix – Cont… • The MLE achieves the Cramer-Rao lower bound, which means that the variance of the estimators equals the inverse of the information matrix: • Now, • note, the off diagonal elements are zero.
The Information Matrix – Cont… • The negative of the expectation is: • The inverse of this is:
Another way of Writing I(,2) • For a vector, , of parameters, I(), the information matrix, can be written in a second way: • This second form is more convenient for estimation, because it does not require estimating second derivatives.
Estimation • The Likelihood Ratio Test • Let be a vector of parameters to be estimated. • Let H0 be a set of restrictions on these parameters. • These restrictions could be linear or non-linear. • Let be the MLE of estimated without regard to constraints (the unrestricted model). • Let be the constrained MLE.
The Likelihood Ratio Test Statistic • If and are the likelihood functions evaluated at these two estimates, the likelihood ratio is given by: • Then, -2ln() = -2(ln( ) – ln( ) ~ 2 with degrees of freedom equal to the number of restrictions imposed.
Another Look at the LR Test • Concentrated Log-Likelihood: Many problems can be formulated in terms of partitioning a parameter vector, into {1, 2} such that the solution to the optimization problem, can be written as a function of , e.g.: • Then, we can concentrate the log-likelihood function as: F*(1, 2) = F(1, t(1)) Fc().
Why Do This? • The unrestricted solution to • then provides the full solution to the optimization problem, since t is known. • We now use this technique to find estimates for the classical linear regression model.
Example • The log-likelihood function (from CLM) with normal disturbances is given by: • The solution to the likelihood equation for implies that however we estimate , the estimator for will be:
Ex: Concentrating the Likelihood Function • Inserting this back into the log-likelihood yields: • Because (y - X)(y - X) is just the sum of squared residuals from the regression (ee) we can rewrite ln(Lc) as:
Ex: Concentrating the Likelihood Function • For the restricted model we obtain the restricted concentrated log-likelihood: • So, plugging in these concentrated log-likelihoods into our definition of the LR test, we obtain: • Or, T times the log of the ratio of the restricted SSR and the unrestricted SSR, a nice intuition.
Ex: OLS with Normal Errors • True regression model: • The t are iid normal. • Sample size is T. • Restriction: = 1.
Example – Cont… • The first-order conditions for the estimates and simply reduce to the OLS normal equations:
Example – Cont… • Solving • Substituting into the FOC for yields:
Example – Cont… • Solve for as before:
Example – Cont… • The restricted model is exactly the same, except that is constrained to be one, so that the normal equation reduces to: and One can then plug in to obtain and form the likelihood ratio, which is distributed 2(1).
The Wald Test • The problem with LR test: Need both restricted and unrestricted model estimates. • One or the other could be hard to compute. • The Wald test is an alternative that requires estimating the unrestricted model only. • Suppose y ~ N(X, ), with a sample size of T, then:
The Wald Test – Cont… • Under the null hypothesis that E(y) = X, the quadratic form above has a 2 distribution. If the hypothesis is false, the quadratic form will be larger, on average, than it would be if the null were true. • In particular, it will be a non-central 2 with the same degrees of freedom, which looks like a central 2, but lies to the right. • This is the basis for the test.
The Restricted Model • Now, step back from the normal and let be the parameter estimates from the unrestricted model. • Let restrictions be given by H0: f() = 0. • If the restrictions are valid, then should satisfy them. • If not, should be farther from zero than would be explained by sampling error alone.
Formalism • The Wald statistic is • Under H0 in large samples, W ~ 2 with d.f. equal to the number of restrictions. See Greene ch.9 for details. • Lastly, to use the Wald test, we need to compute the variance term:
Restrictions on Slope Coefficients • If the restrictions are on slope coefficients of a linear regression, then: where and K is the number of regressors. • Then, we can write the Wald Statistic: where J is the number of restrictions.
Linear Restrictions H0: R - q = 0 • For example, suppose there were three betas, 1, 2, and 3. Let’s look at three tests. (1) 1 = 0, (2) 1 = 2, (3) 1 = 0 and 2 = 2. • Each row of R is a single linear restriction on the coefficient vector.
Writing R • Case 1: • Case 2: • Case3:
The Wald Statistic • In general, the Wald statistic with J linear restrictions reduces to: with J d.f. • We will use these tests extensively in our discussion of Chapters5 and 6 of CLM.
The F Test • A related way to test the validity of the J restrictions R - q = 0 • Recall that the F test can be written in terms of a comparison of the sum of squared residuals for the restricted and unrestricted models: • or
Why Do We Care? • We care because in a linear model with normally distributed disturbances under the null, the test statistic derived above is exact. • This will be important later because under normality, some of our cross-sectional CAPM tests will be of this form and, • A sufficient condition for the (static) CAPM to be “correct” is for asset returns to be normally distributed.
The LM Test • This is a test that involves computing only the restricted estimator. • If the hypothesis is valid, at the value of the restricted estimator, the derivative of the log-likelihood function should be close to zero. • We will next form the LM test with the J restrictions f() = 0.
The LM Test – Cont… This is maximized by choice of and
First-order Conditions • and
The LM Test – Cont… • The test then, is whether the Lagrange multipliers equal zero. When the restrictions are linear, the test statistic becomes (see Greene, chapter 7): where J is the number of restrictions.
W, LR, LM, and F • We compare them for J linear restrictions in the linear model with K regressors. It can be shown that: • and that W > LR > LM.