1 / 44

10. Generalized linear models

10. Generalized linear models. 10.1 Homogeneous models Exponential families of distributions, link functions, likelihood estimation 10.2 Example: Tort filings 10.3 Marginal models and GEE 10.4 Random effects models 10.5 Fixed effects models

Download Presentation

10. Generalized linear models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 10. Generalized linear models • 10.1 Homogeneous models • Exponential families of distributions, link functions, likelihood estimation • 10.2 Example: Tort filings • 10.3 Marginal models and GEE • 10.4 Random effects models • 10.5 Fixed effects models • Maximum likelihood, conditional likelihood, Poisson data • 10.6 Bayesian Inference • Appendix 10A Exponential families of distributions

  2. 10.1 Homogeneous models • Section Outline • 10.1.1 Exponential families of distributions • 10.1.2 Link functions • 10.1.3 Likelihood estimation • In this section, we consider only independent responses. • No serial correlation. • No random effects that would induce serial correlation.

  3. Exponential families of distributions • The basic one parameter exponential family is • Here, y is a response and q is the parameter of interest. • The parameter f is a scale parameter that we often will assume is known. • The term b(q) depends only on the parameter q, not the responses. • S(y, f) depends only on the responses and the scale parameter, not the parameter q. • The response y may be discrete or continuous. • Some straightforward calculations show that E y = b¢(q) and Var y = b²(q) f.

  4. Special cases of the basic exponential family • Normal • The probability density function is • Take m = q, s2 = f , b(q) = q 2/2 and S(y, f ) = - y2 / (2f) - ln(2 pf))/2 . • Note that E y = b¢(q) = q =m and Var y = b¢¢(m) s2 = s2. • Binomial, n trials and prob p of success • The probability mass function is • Take ln (p/(1-p))= logit (p) = q, 1 = f , b(q) = n ln (1 + eq) and S(y, f ) = ln((n choose y)) . • Note that E y = b¢(q) = neq/(1 + eq) = np and Var y = b¢¢(q) (1) = neq/(1 + eq)2 = np(1-p) , as anticipated.

  5. Another special case of the basic exponential family • Poisson • The probability mass function is • Take ln (l) = q, 1 = f , b(q) =eq and S(y, f ) = -ln( y!)) . • Note that E y = b¢(q) = eq = l and • Var y = b¢¢(q) (1) = eq = l , as anticipated.

  6. 10.1.2 Link functions • To link up the univariate exponential family with regression problems, we define the systematic component of yit to be hit = xitb . • The idea is to now choose a “link” between the systematic component and the mean of yit , say mit , of the form: hit = g(mit) . • g(.) is the link function. • Linear combinations of explanatory variables, hit = xitb, may vary between negative and positive infinity. • However, means may be restricted to smaller range. For example, Poisson means vary between zero and infinity. • The link function serves to map the domain of the mean function onto the whole real line.

  7. Bernoulli illustration of links • Bernoulli means vary between 0 and 1, although linear combinations of explanatory variables may vary between negative and positive infinity. • Here are three important examples of link functions for the Bernoulli distribution: • Logit: h = g(m) = logit(m) = ln (m/(1- m)) . • Probit: h = g(m) = F-1(m) , where F-1 is the inverse of the standard normal distribution function. • Complementary log-log: h = g(m) = ln ( -ln(1- m) ) . • Each function maps the unit interval (0,1) onto the whole real line.

  8. Canonical links • As we have seen with the Bernoulli, there are several link functions that may be suitable for a particular distribution. • When the systematic component equals the parameter of interest (h = q ), this is an intuitively appealing case. • That is, the parameter of interest, q , equals a linear combination of explanatory variables, h. • Recall that h = g(m) and m = b¢(q). • Thus, if g-1 = b¢, then h = g(b¢(q)) = q. • The choice of g, such that g-1 = b¢, is called a canonical link. • Examples: Normal: g(q) = q, Binomial: g(q) = logit(q), Poisson: g(q) = ln q.

  9. 10.1.3 Estimation • Begin with likelihood estimation for canonical links • Consider responses yit, with mean mit, systematic component hit = g(mit) = xitb and canonical link so that hit = qit. • Assume the responses are independent. • Then, the log-likelihood is

  10. MLEs - Canonical links • The log-likelihood is • Taking the partial derivative with respect to b yields the score equations: • because mit = b¢(qit) = b¢(xit¢b ). • Thus, we can solve for the mle’s of b through: 0 = Sitxit (yit - mit). • This is a special case of the method of moments.

  11. MLEs - general links • For general links, we no longer assume the relation qit = xit¢b. • We assume that bis related to qit through mit = b¢(qit) and hit = xit¢b = g(mit). • Recall that the log-likelihood is • Further, E yit = mit and Var yit = b¢¢(qit) / f . • The jth element of the score function is • because b ¢(qit) = mit

  12. MLEs - more on general links • To eliminate qit, we use the chain rule to get • Thus, • This yields • This is called the generalized estimating equations form.

  13. Overdispersion • When fitting models to data with binary or count dependent variables, it is common to observe that the variance exceeds that anticipated by the fit of the mean parameters. • This phenomenon is known as overdispersion. • A probabilistic models may be available to explain this phenomenon. • In many situations, analysts are content to postulate an approximate model through the relation Var yit = 2 b(xitβ) / wit. • The scale parameter  is specified through the choice of the distribution • The scale parameter σ2 allows for extra variability. • When the additional scale parameter σ2 is included, it is customary to estimate it by Pearson’s chi-square statistic divided by the error degrees of freedom. That is,

  14. 10.2 Example: Tort filings

  15. Offsets • We assume that yit is Poisson distribution with parameter POPit exp(xitβ), • where POPit is the population of the ith state at time t. • In GLM terminology, a variable with a known coefficient equal to 1 is known as an offset. • Using logarithmic population, our Poisson parameter for yit is • An alternative approach is to use the average number of tort filings as the response and assume approximate normality. • Note that in the Poisson model above the expectation of the average response is • whereas the variance is

  16. Tort filings • Purpose: to understand ways in which state legal, economic and demographic characteristics affect the number of filings. • Table 10.3 suggests more filings under JSLIAB and PUNITIVE but less under CAPS • Table 10.5 • All variables under the homogenous model are statistically significant • However, estimated scale parameter seems important • Here, only JSLIAB is (positively) statistically significant • Time (categorical) variable seems important

  17. 10.3 Marginal models • This approach reduces the reliance on the distributional assumptions by focusing on the first two moments. • We first assume that the variance is a known function of the mean up to a scale parameter, that is, Var yit = v(mit) f . • This is a consequence of the exponential family, although now it is a basic assumption. • That is, in the GLM setting, we have Var yit = b¢¢(qit) f and mit = b¢(qit). • Because b(.) and f are assumed known, Var yitis a known function of mit . • We also assume that the correlation between two observations within the same subject is a known function of their means, up to a vector of parameters t. • That is corr(yir , yis ) = r(mir, mis, t) , for r( .) known.

  18. This framework incorporates the linear model nicely; we simply use a GLM with a normal distribution. However, for nonlinear situations, a correlation is not always the best way to capture dependencies among observations. Here is some notation to help see the estimation procedures. Define mi= (mi1,mi2, ..., miTi)´ to be the vector of means for the ith subject. To express the variance-covariance matrix, we define a diagonal matrix of variances Vi = diag(v(mi1),..., v(miTi) ) and the matrix of correlations Ri(t) to be a matrix with r(mir, mis , t) in the rth row and sth column. Thus, Var yi = Vi1/2Ri(t) Vi1/2. Marginal model

  19. Generalized estimating equations • These assumptions are suitable for a method of moments estimation procedure called “generalized estimating equations” (GEE) in biostatistics, also known as the generalized method of moments (GMM) in econometrics. • GEE with known correlation parameter • Assuming t is known, the jth row of the GEE is • Here, the matrix • is Tix K*. • For linear models with mit= zit ai + xitb, this is the GLS estimator introduced in Section 3.3.

  20. Consistency of GEEs • The solution, bEE, is asymptotically normal with covariance matrix • Because this is a function of the means, mi, it can be consistently estimated.

  21. Robust estimation of standard errors • empirical standard errors may be calculated using the following estimator of the asymptotic variance of bEE

  22. GEE - correlation parameter estimation • For GEEs with unknown correlation parameters, Prentice (1988) suggests using a second estimating equation of the form: • where • Diggle, Liang and Zeger (1994) suggest using the identity matrix for most discrete data. • However, for binary responses, • they note that the last Ti observations are redundant because yit = yit2 and should be ignored. • they recommend using

  23. Tort filings • Assume an independent working correlation • This yields at the same parameter estimators as in Table 10.5, under the homogenous Poisson model with an estimated scale parameter. • JSLIAB is (positively) statistically significant, using both model-based and robust standard errors. • To test the robustness of this model fit, we fit the same model with an AR (1) working correlation. • Again, JSLIAB is (positively) statistically significant. • Interesting that CAPS is now borderline but in the opposite direction suggested by Table 10.3

  24. 10.4 Random effects models • The motivation and sampling issues regarding random effects were introduced in Chapter 3. • The model is easiest to introduce and interpret in the following hierarchical fashion: • 1. Subject effects {ai} are a random sample from a distribution that is known up to a vector of parameters t. • 2. Conditional on {ai}, the responses • {yi1,yi2, ... , yiTi } are a random sample from a GLM with systematic component hit = zit ai + xitb .

  25. Random effects models • This model is a generalization of: • 1. The linear random effects model in Chapter 3 - use a normal distribution. • 2. The binary dependent variables random effects model of Section 9.2 - using a Bernoulli distribution. (In Section 9.2, we focused on the case zit =1.) • Because we are sampling from a known distribution with a finite/small number of parameters, the maximum likelihood method of estimation is readily available. • We will use this method, assuming normally distributed random effects. • Also available in the literature is the EM (for expectation-maximization) algorithm for estimation - See Diggle, Liang and Zeger (1994).

  26. Random effects likelihood • Conditional on ai, the likelihood for the ith subject at the tth observation is • where b¢(qit) = E (yit | ai) and hit = zit ai + xitb= g(E (yit | ai) ). • Conditional on ai, the likelihood for the ith subject is: • We take expectations over ai to get the (unconditional) likelihood. • To see this explicitly, let’s use the canonical link so that qit= hit. The (unconditional) likelihood for the ith subject is • Hence, the total log-likelihood is Si ln li. • The constant SitS(yit , f) is unimportant for determining mle’s. • Although evaluating, and maximizing, the likelihood requires numerical integration, it is easy to do on the computer.

  27. Random effects and serial correlation • We saw in Chapter 3 that permitting subject-specific effects, ai, to be random induced serial correlation in the responses yit. • This is because the variance-covariance matrix of yit is no longer diagonal. • This is also true for the nonlinear GLM models. To see this, • let’s use a canonical link and • recall that E (yit | ai) ) = b¢(qit) = b¢(hit ) = b¢(ai + xit b).

  28. Covariance calculations • The covariance between two responses, yi1 and yi2 , is Cov(yi1 , yi2 ) = E yi1yi2 - E yi1 E yi2 = E {b¢(ai+xi1b) b¢(ai+xi2b)} - E b¢(ai+xi1b) E b¢(ai+xi2b) • To see this, using the law of iterated expectations, E yi1yi2 = E E (yi1yi2| ai) = E {E (yi1| ai) E(yi2 | ai)} = E {b¢(ai+ xi1 b) b¢(ai+ xi2 b)}

  29. More covariance calculations • Normality • For the normal distribution we have b¢(a) = a. • Thus, Cov(yi1 , yi2 ) = E {(ai+ xi1b) (ai + xi2b)} - E (ai + xi1b) E (ai + xi2b) = E ai2 + (xi1b) (xi2b)- (xi1b) (xi2b)= Var ai. • For the Poisson, we have b¢(a) = ea. Thus, E yit = E b¢(ai+ xitb) = E exp(ai+ xitb) = exp(xitb) E exp(ai) and • Cov(yi1 , yi2 ) = E {exp(ai+ xi1b) exp(ai+ xi2b)} - exp((xi1+xi2)b) {E exp(ai)}2 = exp((xi1+xi2)b) {E exp(2a) - (E exp(a))2 } = exp((xi1+xi2)b) Var exp(a) .

  30. Random effects likelihood • Recall, from Section 10.2, that the (unconditional) likelihood for the ith subject is • Here, we use zit = 1,f = 1, and g(a) is the density of ai. • For the Poisson, we have b(a) = ea , and S(y, f) = -ln(y!), so the likelihood is • As before, evaluating and maximizing the likelihood requires numerical integration, yet it is easy to do on the computer.

  31. 10.5 Fixed effects models • Consider responses yit, with mean mit, systematic component hit = g(mit) = zitai + xitb and canonical link so that hit = qit. • Assume the responses are independent. • Then, the log-likelihood is • Thus, the responses yitdepend on the parameters through only summary statistics. • That is, the statistics Styitzit are sufficient for ai . • The statistics Sityitxit are sufficient for b. • This is a convenient property of the canonical links. It is not available for other choices of links.

  32. MLEs - Canonical links • The log-likelihood is • Taking the partial derivative with respect to ai yields: • because mit = b¢(qit) = b¢(zit¢ai + xit¢b ). • Taking the partial derivative with respect to b yields: • Thus, we can solve for the mle’s of ai and b through: 0 = Stzit (yit - mit), and 0 = Sitxit (yit - mit). • This is a special case of the method of moments. • This may produce inconsistent estimates of b , as we have seen in Chapter 9.

  33. Conditional likelihood estimation • Assume the canonical link so that qit= hit = zitai + xitb . • Define the likelihood for a single observation to be • Let Si be the random vector representing St zityit and let sumi be the realization of St zityit . • Recall that St zityitare sufficient for ai. • The conditional likelihood of the data set is • This likelihood does not depend on {ai}, only on b. • Maximizing it with respect to b yields root-n consistent estimates. • The distribution of Si is messy and is difficult to compute.

  34. Poisson distribution • The Poisson is the most widely used distribution for counted responses. • Examples include the number of migrants from state to state and the number of tort filings within a state. • A feature of the fixed effects version of the model is that the mean equals the variance. • To illustrate the application of Poisson panel data models, let’s use the canonical link and zit = 1, so that ln E (yit | ai) = g(E (yit | ai) ) = qit = hit = ai + xit b . • Through the log function, it links the mean to a linear combination of explanatory variables. It is the basis of the so-called “log-linear” model.

  35. Conditional likelihood estimation • We first examine the fixed effects model and thus assume that {ai} are fixed parameters. • Thus, E yit = exp (ai + xit b). • The distribution is • From Section 10.1, St yit is a sufficient statistic for ai. • The distribution of Styit turns out to be Poisson, with mean exp(ai) St exp(xit b) . • Note that the ratio of means, • does not depend on ai.

  36. Conditional likelihood details • Thus, as in Section 10.1, the conditional likelihood for the ith subject is

  37. Conditional likelihood details • where • This is a multinomial distribution.

  38. Multinomial distribution • Thus, the joint distribution of yi1, ..., yiTi given Styit has a multinomial distribution. • The conditional likelihood is: • Taking partial derivatives yields: • where • . • Thus, the conditional MLE, b, is the solution of:

More Related