ML Estimation of the General Error Variance and Nonlinear Models

ML Estimation of the General Error Variance and Nonlinear Models

ML Estimation of Linear Models with General Covariance Matrix • Lets return to our general model where: • y=Xβ + e • y is a (T x 1) vector of obs. on the dependent variable • X is (T x K) matrix of obs. on the K exogenous variables • β is (K x 1) vector of parameters • e is a (T x 1) error vector where E(e)=0 and E(ee′) = Φ = σ2Ψ • Ψ is a (T x T) matrix assumed to depend on parameters Θ, Ψ = Ψ(Θ) • Dimension of Θ and how Ψ depends on Θ is a function of assumptions made about the data generating process (e.g., heteroscedasticy vs. autocorrelation) scaler

ML Estimation of Linear Models with General Covariance Matrix • Previously, we developed a number of 2-step estimators depending on the structure of Ψ: • AR(1) • Multiplicative Heterscedasticity • In these 2-step approaches • Θ are estimated from information provided by the CRM residuals • Ψ(Θ) developed using above estimates • FGLS procedures used • As an alternative, lets see how we can use ML techniques to estimate the parameters of the general linear model directly

ML Estimation of Linear Models with General Covariance Matrix • If we assume that the error term and therefore y are both normally distributed, the total sample log likelihood function for (β,Θ) can be written as: Exogenous variables Φ(Θ*) = σ2Ψ(Θ) Θ* = σ2|Θ

ML Estimation of Linear Models with General Covariance Matrix • Similar to the CRM we saw earlier that maximization of L with respect to β and σ2 conditional on Θ results in • If we substitute the above definition of βG into the original likelihood function • We obtain the concentrated log-likelihood function • Only a function of exogenous data and Θ, the parameters of the error covariance matrix βG does not depend on σ2

ML Estimation of Linear Models with General Covariance Matrix • This implies the concentrated log-likelihood apart from constants is: eG = y – XβG(Θ)

ML Estimation of Linear Models with General Covariance Matrix • The maximum likelihood estimator for Θ, ΘL is that value of Θ for which L*(Θ) is a maximum • Given the concentrated log likelihood: • Lets define a modified weighted sums of squared errors, S*(Θ)

ML Estimation of Linear Models with General Covariance Matrix • Given the above positive monotonic transformations, the value of Θ that maximizes the concentrated log-likelihood function is also the value of Θ that minimizes S*(Θ) • The presence of the term |Ψ(Θ)|1/T differs from our previous definition of the weighted sum of squared errors under the general variance model, • S(β,Θ)=e′GΨ(Θ)-1eG S*(β,Θ)= e′GΨ(Θ)-1eG|Ψ(Θ)|1/T • If Ψ(Θ) does not depend on T, as T→∞, |Ψ(Θ)|1/T →1 which implies S*(β,Θ)→ S(β,Θ)

Maximum Likelihood and Mult. Heteroscedasticy Linear wrt β Nonlinear wrt α • Lets return to our multiplicative heteroscedasticy example we set up before where: • Yt=Xtβ + et E(e2t) = σ2t=exp(Wtα) =σ2exp(W*tα*) Wt=(1,W*t) t=(1,…,T) α' = (ln(σ2) α*) • Φ is a diagonal matrix w/tth diagonal element: σ2t=exp(Wtα) • JHGLL, p.538-541, 548-551, Greene, p.522-527

Maximum Likelihood and Mult. Heteroscedasticy • If we assume the error terms follow a multivariate normal distribution then the sample log-likelihood is: σ2=exp(α1)

Maximum Likelihood and Mult. Heteroscedasticy • Under this framework, the information matrix can be derived: S = # of variance exog. variables Assumed normal distribution For β’s W is TxS vector of exogenous data For α’s vs. 4.9348(W′W)-1 under 2-step method

Maximum Likelihood and Mult. Heteroscedasticy ML Model of Multiplicative Hetero. Proc Defining Likelihood Function Dep. & Exog. Variables Numerical Gradients Maximum Likelihood Procedure Estimates of ,  Starting Values Proc for Calc. Variances , 

Maximum Likelihood and Mult. Heteroscedasticy • ML estimation of multiplicative heteroscedasticity model • JHGLL, pg. 538-541, Table 9.1 • MATLAB Code uses the BHHH algorithm • I use the 2-step results as starting values for the ML estimation (K x 1) step BHHH Lt is tth contribution (K x K)

Maximum Likelihood and Mult. Heteroscedasticy • ML estimation of multiplicative heteroscedasticity model • MATLAB Code: • ∂L/∂θ = Z, use numerical gradients • H = Z′Z Pn= -solve(H) • db = (solve(H)%*%colSums (z)) • Overview of Results Analytical Gradients LLF Function full step R commands H

Maximum Likelihood and Mult. Heteroscedasticy • I use two methods to obtain parameter covariance matrix: • BHHH method (Numerical) • -Pn • Previous Analytical Analytical Gradient Function 15

Maximum Likelihood and Mult. Heteroscedasticy • As an alternative to the BHHH algorithm used in the above MATLAB code • I(Θ) is block diagonal (given the functional form of the heteroscedasticity) • Judge et al., p.540., recommend using the method of scoring (GN) to undertake the iterations separately for β and α .

Maximum Likelihood and Mult. Heteroscedasticy • That is, for β under the method of scoring we have: • Given the above structure of the Hessian matrix, one can show that

Maximum Likelihood and Mult. Heteroscedasticy • where Wt is a (1 x D) vector of variables on which the variance of et depends and Xt is (1 x K) • Note the dimensions used in the JHGLL discussion and that presented here, there is a difference

Maximum Likelihood and Mult. Heteroscedasticy • Similarly, for α under the method of scoring we have:

Maximum Likelihood and Mult. Heteroscedasticy • Testing for Multiplicative Heteroskedasticity Using 3 Asymptotic Tests • H0: Homoscedasticity H1: Multiplicative Heteroscedasticity • H0: α2=α3=…=αS=0 H1: at least one of the above ≠0 • Continuing with our example (Table 9.1 JHGLL)

Maximum Likelihood and Mult. Heteroscedasticy • Wald Test • 1 element in α* • Remember Σα,ML=2(W′W)-1 • λW=(0.21732)2[(0.00379774)-1]/2 = 6.218 • χ21,.05=3.84→Reject H0 exog. variables est. coeff. (W*′W*)-1 Var(α2)=2*.003798=0.007596

Maximum Likelihood and Mult. Heteroscedasticy • Lagrangian Multiplier Test • Information Matrix=1/2(W′W) • Σα= 2(W′W)-1 • Using the theoretical results shown in the previous handout λLM=4.028>3.84→reject H0 LM= S(0)′I(0)-1S(0)

Maximum Likelihood and Mult. Heteroscedasticy • Likelihood Ratio Test • Using the full likelihood function LU=-61.748 LR=-64.315 λLR=2(-61.748-(-64.315))=5.134 difference in concentrated LLF’s χ21,.05=3.84

Maximum Likelihood and Mult. Heteroscedasticy • Remember that in general: λW > λLR > λLM • In our empirical example: λW = 6.218 λLR = 5.204 λLM=4.028

Maximum Likelihood and the AR(1) Model • Lets examine another example of the general linear model • Lets assume that we have a linear model where the error terms are autocorrelated with an AR(1) error structure • y=Xβ + e • y is a (T x 1) vector of observations on the dependent variable • X is (T x K) matrix of observations on the K exogenous variables • β is (K x 1) vector of parameters • e is a (T x 1) error vector where E(e)=0 and E(ee′)=σ2Ψ • et=ρet-1 + νt • Var(et) = Var(et+i) = Var(et-i) = σe2

Maximum Likelihood and the AR(1) Model • As we derived earlier, the above implies: Constant variance

Maximum Likelihood and the AR(1) Model • The above implies:

Maximum Likelihood and the AR(1) Model • Previously, we noted that under the general model (and assuming normally distributed error terms), the sample log-likelihood function can be represented as: • Under the AR(1) structure, Ψ(Θ) = Ψ(ρ) • In order to obtain ML estimates of β and ρ all we have to do is build the Ψ matrix e*=Pe P′P=Ψ-1

Maximum Likelihood and the AR(1) Model • As an alternative, using the previous results with respect to the use of ML techniques of the general linear model • The ML estimators for β and ρ are those values that minimize • This expression is similar to what we derived under the general linear model except only συ2 (not β and συ2) has been concentrated out e*=Pe P′P=Ψ-1

Maximum Likelihood and the AR(1) Model • We can evaluate the determinant: |AB|=|A| |B| and P′P=Ψ(ρ)-1

Maximum Likelihood and the AR(1) Model • Summarizing the above: • This implies that: • If T is large and |ρ| is not too far away from 1, the effect of the first term in the above will be small 31

Maximum Likelihood and the AR(1) Model • Given we can use the above to find maximum likelihood estimates of β and ρ as Judge et al, p.534 note, it can shown that Information Matrix (K+2) x (K+2) Previous Result

Maximum Likelihood and the AR(1) Model • Following the results we obtain using the FGLS procedures, the asymptotic covariance and variance estimators are:

Maximum Likelihood Under the Nonlinear Model • y = f(X,β)+e e~N(0,σ2IT) • Using general notation we can represent the sample likelihood function as: Error term assumption nonlinear function e'e 34

Maximum Likelihood Under the Nonlinear Model • The sample log-likelihood function is: • It is not, in general, possible to find an analytical expression for the max. like. estimate, βML such that ∂L/∂β=0 • But, ∂L/∂(σ2)=0→σ2ML=S(β)/T 35

Maximum Likelihood Under the Nonlinear Model • Given the above we can generate the concentrated log-likelihood function wrt only β by replacing σ2 with σ2ML σ2ML For a given T 36

Maximum Likelihood Under the Nonlinear Model • βML that maximizes L*(β|y,X) is identical to the nonlinear least square estimator that minimizes S(β) since S(β)>0 • Equivalence only holds for nonlinear models w/the form: Y=f(X,β) + e e~N(0,σ2IT) 37

Maximum Likelihood Under the Nonlinear Model • y = f(X,β)+e e~N(0,σ2Ψ) where Ψ=Ψ(θ) • We can represent the log likelihood function as: a vector of parameters Weighted SSE’s Z=∂f/∂β 38

Maximum Likelihood Under the Nonlinear Model • Given that an estimate of σ2 is: • We substitute this into the above log-likelihood function to generate a concentrate log-likelihood function with respect to σ2. That is: the terms cancel out 39

ML Estimation of the General Error Variance and Nonlinear Models

ML Estimation of the General Error Variance and Nonlinear Models

Presentation Transcript

Nonlinear Models

Sample variance and sample error

GENERAL LINEAR MODELS: Estimation algorithms

variance for the error covariance

Nonlinear optical flow estimation

Error Estimation for Reduced Order Models of Dynamical Systems

Multilevel Models in Survey Error Estimation

NLS Estimation of the General Variance Model

Predictive Models of Realized Variance Incorporating Sector and Market Variance

Nonlinear Programming Models

method of nonlinear estimation

Replicate Variance Estimation and High Entropy Variance Approximation

Error Estimation

Nonlinear Models

Variance Estimation

Maximum Likelihood (ML) Estimation

Nonlinear Regression Models

Output Analysis: Variance Estimation

Error estimation

Partitioning of variance – Genetic models