Tests of Static Asset Pricing Models

Tests of Static Asset Pricing Models

Tests of Static Asset Pricing Models • In general asset pricing models quantify the tradeoff between risk and expected return. • Need to both measure risk and relate it to the expected return on a risky asset. • The most commonly used models are: • CAPM • APT • FF three factor model

Testable Implications • These models have testable implications. For the CAPM, for example: • Expected excess return of a risky asset is proportional to the covariance of its return and that of the market portfolio. • Note, this tells us the measure of risk used and its relation to expected return. • There are other restrictions that depend upon whether there exists a riskless asset.

Testable Implications • For the APT, • The expected excess return on a risky asset is linearly related to the covariance of its return with various risk factors. • These risk factors are left unspecified by the theory and have been: • Derived from the data (CR (1983), CK) • Exogenously imposed (CRR (1985))

Plan • Review the basic econometric methodology we will use to test these models. • Review the CAPM. • Test the CAPM. • Traditional tests (FM (1972), BJS (1972), Ferson and Harvey) • ML tests (Gibbons (1982), GRS (1989)) • GMM tests • Factor models: APT and FF • Curve fitting vs. ad-hoc theorizing

Econometric Methodology Review • Maximum Likelihood Estimation • The Wald Test • The F Test • The LM Test • A specialization to linear models and linear restrictions • A comparison of test statistics

Review of Maximum Likelihood Estimation • Let {x1, … xT} be a sample of T, i.i.d. random variables. • Call that vector x. • Let x be continuously distributed with density f(x|). • Where, is the unknown parameter vector that determines the distribution.

The Likelihood Function • The joint density for the independent random variables is given by: f(x1|) f(x2|) f(x3|)… f(xT|) • This joint density is known as the likelihood function, L(x|) L(x|)= f(x1|) f(x2|) f(x3|)… f(xT|) • Can you write the joint density and L(x|) this way when dealing with time-dependent observations?

Independence • You can’t. • The reason you can write the product f(x1|) f(x2|) f(x3|)… f(xT|) is because of the independence. • If you have dependence, writing the joint density can be extremely complicated. • See, e.g. Hamilton (1994) for a good discussion of switching regression models and the EM algorithm.

Idea Behind Maximum Likelihood Estimation • Pick the parameter vector estimate, , that maximizes the likelihood, L(x|), of observing the particular vector of realizations, x.

MLE Plusses and Minuses • Plusses: Efficient estimation in terms of picking the estimator with the smallest covariance matrix. • Question: are ML estimators necessarily unbiased? • Minuses: Strong distributional assumptions make robustness a problem.

MLE Example: Normal Distributions where OLS assumptions are satisfied • Sample y of size T is normally distributed with mean x where • X is a T x K matrix of explanatory variables •  is a K x 1 vector of parameters • The variance-covariance matrix of the errors from the true regression is 2I, where • I is a T x T identity matrix

The Likelihood Function • The likelihood function for the linear model with independent normally distributed errors is:

The Log-Likelihood Function • With independent draws, it is easier to maximize the log-likelihood function, because products are replaced by sums. The log-likelihood is given by:

First-order Conditions:

The Information Matrix • If  is our parameter vector, • I() is the information matrix, • which is minus the expectation of the matrix of second partial derivatives of the log-likelihood with respect to the parameters.

The Information Matrix – Cont… • The MLE achieves the Cramer-Rao lower bound, which means that the variance of the estimators equals the inverse of the information matrix: • Now, • note, the off diagonal elements are zero.

The Information Matrix – Cont… • The negative of the expectation is: • The inverse of this is:

Another way of Writing I(,2) • For a vector, , of parameters, I(), the information matrix, can be written in a second way: • This second form is more convenient for estimation, because it does not require estimating second derivatives.

Estimation • The Likelihood Ratio Test • Let  be a vector of parameters to be estimated. • Let H0 be a set of restrictions on these parameters. • These restrictions could be linear or non-linear. • Let be the MLE of  estimated without regard to constraints (the unrestricted model). • Let be the constrained MLE.

The Likelihood Ratio Test Statistic • If and are the likelihood functions evaluated at these two estimates, the likelihood ratio is given by: • Then, -2ln() = -2(ln( ) – ln( ) ~ 2 with degrees of freedom equal to the number of restrictions imposed.

Another Look at the LR Test • Concentrated Log-Likelihood: Many problems can be formulated in terms of partitioning a parameter vector,  into {1, 2} such that the solution to the optimization problem, can be written as a function of , e.g.: • Then, we can concentrate the log-likelihood function as: F*(1, 2) = F(1, t(1))  Fc().

Why Do This? • The unrestricted solution to • then provides the full solution to the optimization problem, since t is known. • We now use this technique to find estimates for the classical linear regression model.

Example • The log-likelihood function (from CLM) with normal disturbances is given by: • The solution to the likelihood equation for implies that however we estimate , the estimator for will be:

Ex: Concentrating the Likelihood Function • Inserting this back into the log-likelihood yields: • Because (y - X)(y - X) is just the sum of squared residuals from the regression (ee) we can rewrite ln(Lc) as:

Ex: Concentrating the Likelihood Function • For the restricted model we obtain the restricted concentrated log-likelihood: • So, plugging in these concentrated log-likelihoods into our definition of the LR test, we obtain: • Or, T times the log of the ratio of the restricted SSR and the unrestricted SSR, a nice intuition.

Ex: OLS with Normal Errors • True regression model: • The t are iid normal. • Sample size is T. • Restriction:  = 1.

Example – Cont… • The first-order conditions for the estimates and simply reduce to the OLS normal equations:

Example – Cont… • Solving • Substituting into the FOC for yields:

Example – Cont… • Solve for as before:

Example – Cont… • The restricted model is exactly the same, except that is constrained to be one, so that the normal equation reduces to: and One can then plug in to obtain and form the likelihood ratio, which is distributed 2(1).

The Wald Test • The problem with LR test: Need both restricted and unrestricted model estimates. • One or the other could be hard to compute. • The Wald test is an alternative that requires estimating the unrestricted model only. • Suppose y ~ N(X, ), with a sample size of T, then:

The Wald Test – Cont… • Under the null hypothesis that E(y) = X, the quadratic form above has a 2 distribution. If the hypothesis is false, the quadratic form will be larger, on average, than it would be if the null were true. • In particular, it will be a non-central 2 with the same degrees of freedom, which looks like a central 2, but lies to the right. • This is the basis for the test.

The Restricted Model • Now, step back from the normal and let be the parameter estimates from the unrestricted model. • Let restrictions be given by H0: f() = 0. • If the restrictions are valid, then should satisfy them. • If not, should be farther from zero than would be explained by sampling error alone.

Formalism • The Wald statistic is • Under H0 in large samples, W ~ 2 with d.f. equal to the number of restrictions. See Greene ch.9 for details. • Lastly, to use the Wald test, we need to compute the variance term:

Restrictions on Slope Coefficients • If the restrictions are on slope coefficients of a linear regression, then: where and K is the number of regressors. • Then, we can write the Wald Statistic: where J is the number of restrictions.

Linear Restrictions H0: R - q = 0 • For example, suppose there were three betas, 1, 2, and 3. Let’s look at three tests. (1) 1 = 0, (2) 1 = 2, (3) 1 = 0 and 2 = 2. • Each row of R is a single linear restriction on the coefficient vector.

Writing R • Case 1: • Case 2: • Case3:

The Wald Statistic • In general, the Wald statistic with J linear restrictions reduces to: with J d.f. • We will use these tests extensively in our discussion of Chapters5 and 6 of CLM.

The F Test • A related way to test the validity of the J restrictions R - q = 0 • Recall that the F test can be written in terms of a comparison of the sum of squared residuals for the restricted and unrestricted models: • or

Why Do We Care? • We care because in a linear model with normally distributed disturbances under the null, the test statistic derived above is exact. • This will be important later because under normality, some of our cross-sectional CAPM tests will be of this form and, • A sufficient condition for the (static) CAPM to be “correct” is for asset returns to be normally distributed.

The LM Test • This is a test that involves computing only the restricted estimator. • If the hypothesis is valid, at the value of the restricted estimator, the derivative of the log-likelihood function should be close to zero. • We will next form the LM test with the J restrictions f() = 0.

The LM Test – Cont… This is maximized by choice of and

First-order Conditions • and

The LM Test – Cont… • The test then, is whether the Lagrange multipliers equal zero. When the restrictions are linear, the test statistic becomes (see Greene, chapter 7): where J is the number of restrictions.

W, LR, LM, and F • We compare them for J linear restrictions in the linear model with K regressors. It can be shown that: • and that W > LR > LM.

Tests of Static Asset Pricing Models