Introduction to Generalized Linear Model (GLM)

Introduction to Generalized Linear Model (GLM) Man Li, Research Fellow International Food Policy Research Institute Technical Training for Modeling Scenarios for Low Emission Development Strategies, September 9th–20th, 2013

What is GLM? • In statistics, the GLM is a flexible generalization of ordinary linear (OL) regression that allows for response variable (Y) that other than a normal distribution. • The GLM generalizes linear regression by allowing the linear model to be related Y via a LINK FUNCTION, i.e., E(Y) = μ = g-1(Xβ), where g is the link function s.t.g(μ) = Xβ.

Common distributions with typical uses and canonical link functions

Logit Regression for Binary Responses • Example: Survival and gender in the Donner party―an observational study In 1846 the Donner families left Springfield, Illinois for California by covered wagon. When they reached Fort Bridger, Wyoming in July, the Donner party decided to attempt a new and untested route to the Sacramento Valley. Having reached its full size of 87 people and 20 wagons, the party was delayed in the difficult crossing of the Wasatch Range and again in the crossing of the desert west of the Great Salt Lake. The group became stranded in the eastern Sierra Nevada mountains when hit by heavy snows in late October. By the time the last survivor was rescued on 21 April 1847, 40 of the 87 members had died from famine and exposure to extreme cold.

Example: Donner Party Deaths Ages and sexes of the adult (over 15 years) in the party • These data were used to study the theory that females are better able to withstand harsh conditions than are males

Example: Donner Party Deaths • Question: For a given age, were women more likely to survival than were men? • If linear model: • Yi|Xi= Xiβ (i.i.d) • Y = 1 if survived, = 0 if died • X = (age, sex)

Ordinary Linear Regression • Fitting model: Y = 0.747 – 0.013*age + 0.319*I[sex=female]

Ordinary Linear Regression―with Interaction Term • Fitting model: Y = 0.535 – 0.006*age + 1.091*I[sex=female] – 0.025*age*I[sex=female]

Logit Regression • Model: • Yi|Xi ~ Bin(1, πi) (independent) • g(πi) = log(πi/1- πi) = Xiβ • Y = 1 if survived, = 0 if died • X = (age, sex) • Null model: log odds of survival = β0+β1age+β2I[sex=female]

Possible problems • Logitis not a straight line function of age • Do quadratic age term tests separately for males and females (Wald test) X = (age, agesq) • Slopes are not the same for males and females • Test for the significance of interaction term (Wald test) X = (age, sex, age*sex) • Alternative to Wald: Likelihood ratio test

Exercise • Open R program code that is located at ftp://ftp.cgiar.org/ifpri/leds2013sep/GLM/GLM_code.R • Load data named “donner” • Define indicator variable “survival” and “sex” • Draw a scatterplot: survival vs. age by gender

Exercise • Estimate the null model, examine the sign and the p-Value of age and sex variables • Test for the quadratic term of age by gender group • Test for the interaction of sex and age • Draw two fitting plots: the null model and the model with interaction term

How the Results look like? • H0 model: log odds of survival = 1.633-0.078*age+1.597*I[sex=female] • H1 model: log odds of survival = 0.318-0.032*age+6.928*I[sex=female] – 0.025*age*I[sex=female]

Logit Regression for Multiple Responses • Yi|Xi~ Mult(mi, π1i , π2i ,…, πKi), ∑k πki= 1 Y = 1,2,…,K. (K-category response) • There are K-1 logit models: log(π1i /πKi) = Xiβ1 log(π2i / πKi) = Xiβ2 … log(πk-1i / πKi) = XiβK-1 Note: βK is normalized to be 0 • Rewrite the probabilities Pr(Yi = 1) = exp(Xiβ1)/∑kexp(Xiβk) Pr(Yi = 2) = exp(Xiβ2)/∑kexp(Xiβk) … Pr(Yi = K-1) = exp(XiβK-1)/∑kexp(Xiβk) Pr(Yi = K) = exp(XiβK)/∑kexp(Xiβk)

Logit Regression for Multiple Responses • Maximum likelihood estimates LL(β) = ∑i∑kI[Yi = k] *log(Pr(Yi= k)) ) = ∑i∑kI[Yi = k] *log[exp(Xiβk)/∑jexp(Xiβj)] • Goodness of fit: Likelihood ratio index 𝜌 = 1- LL()/LL(0) • Coefficients βk are difficult to interpret; generally use marginal effects to get economic interpretation • Marginal effects: Given one unit change in Xi, how much would be changed in the prob. of Yi?

R Code • multinom() function library(nnet) count.matrix <- cbind(Y1,Y2,…,YK) fit <- multinom(count.matrix~ X1+X2+…, data=, Hess=True)

Some Extensions • Conditional logit • Xik is specific to alternative choice, but β does not vary across choice, i.e., Xikβ • Nested logit • Can be decomposed into two standard logit • Mixed logit • Integrals of standard logit probabilities over a density of parameters β • See Train (2003) Discrete Choice Methods with Simulation for more discussions

Introduction to Generalized Linear Model (GLM)

Introduction to Generalized Linear Model (GLM)

Presentation Transcript

Generalized Linear Models

Introduction the General Linear Model (GLM)

General Linear Model

Generalized Linear Mixed Model

The General Linear Model (GLM)

Part V The Generalized Linear Model

Biostatistics-Lecture 12 Generalized Linear Models

The General Linear Model (GLM)

Generalized Linear Model

Introduction to the General Linear Model (GLM)

A Generalized Linear Model for an Estimation of Drug Expenditures

The General Linear Model (GLM)

Lecture 12: Generalized Linear Models (GLM)

Generalized Linear models

GLM I: Introduction to Generalized Linear Models

STA 216 Generalized Linear Models

GLM I: Introduction to Generalized Linear Models

The General Linear Model

General Linear Model

A Generalized Linear Model for an Estimation of Drug Expenditures

Introduction to Generalized Linear Models