570 likes | 836 Views
Chapter 3: Generalized Linear Models. Chapter 3: Generalized Linear Models. Objectives. Review the linear model. Generalize the linear model. Describe several common generalized linear models. Review Linear Models.
E N D
Objectives • Review the linear model. • Generalize the linear model. • Describe several common generalized linear models.
Review Linear Models • A model is linear in the parameters when there is only one parameter per term and it is a multiplicative constant. • It is not a matter of a linear response. • The response is modeled as a linear combination of terms. • This model is linear: • This model is not linear:
Linear Model Error • The linear model is for the expected value or the mean of the response. • The linear model includes the response errors as normally distributed deviations with a mean of 0 and a constant variance. • The variance does not depend on any explanatory variable. • The errors are added to the expectation.
Generalized Linear Model • The linear model can be generalized to cases with nonnormal responses that are functions of the mean. • A random component uses any distribution in the natural exponential family. • A systematic component relates the predictors to the response. • A link function relates the mean response to the systematic component.
Random Component • The random component uses any distribution in the natural exponential family. The PMF or PDF is in this form: • a(θi) is a function of the distribution parameter. • b(yi) is a function of the response. • Q(θi) is the natural parameter.
Random Component in JMP • The following distributions are available to serve as the random component of a GLM in JMP: • Normal • Binomial • Poisson • Exponential
Systematic Component • The systematic component uses a linear model.
Systematic Component in JMP • Use Fit Model to specify the systematic component, as you would for ordinary least squares regression. • Create linear combinations of effects by adding terms made from data columns.
Link Component • The link function g relates the random component and the systematic component. • The link is a monotonic and differentiable function. • It is the canonical link function if it transforms the mean to the natural parameter Q(θ),.
Link Component in JMP • The following functions are available to serve as the link component in JMP: • Identity • Log • Logit • Reciprocal • Probit • Power: • Complementary log-log:
3.01 Quiz • Match the component of a GLM on the top with its representation or an example on the bottom. • Random component • Systematic component • Link component
3.01 Quiz – Correct Answer • Match the component of a GLM on the top with its representation or an example on the bottom. • Random component • Systematic component • Link component The correct answer is A-3, B-2, and C-1.
Binary Logistic Regression • A binary response can also be modeled with a GLM. • The canonical link function is the logit.
Poisson Regression • A simple model of counts is the Poisson distribution. • The canonical link function is the log.
Poisson Loglinear Model with Offset • The opportunity for the counts might not be constant for all observations. • The opportunity N might be a period of time, a length, an area, or a volume. • Log(Ni) is the offset.
Deviance • The deviance is a measure of goodness of fit. • The deviance assesses the difference between the observed and the predicted response. • Differences should be random (chi-square). • The deviance assesses the value of explanatory variables in the model. • Deviance aids model selection. • The deviance is twice the difference in log-likelihood between the saturated model and the full model.
Over-Dispersion • Lack of fit can result from more variance than expected from the model distribution. • An over-dispersion parameter can be used to account for the excess in the case of a binomial or Poisson distribution. • The parameter equals 1 when there is no over-dispersion.
3.02 Multiple Answer Poll • Which of the following statements are true of the deviance associated with a GLM? • The deviance is the difference between the predicted response and the observed response. • The deviance is twice the difference in log-likelihood between the saturated model and the full model. • The deviance is a measure of goodness of fit. • The deviance measures the variance of the response.
3.02 Multiple Answer Poll – Correct Answer • Which of the following statements are true of the deviance associated with a GLM? • The deviance is the difference between the predicted response and the observed response. • The deviance is twice the difference in log-likelihood between the saturated model and the full model. • The deviance is a measure of goodness of fit. • The deviance measures the variance of the response.
Objectives • Review binary logistic regression models. • Model binary responses with a GLM.
Advantage of Using Logistic Regression • JMP provides the following when you use logistic regression: • Likelihood ratio test for lack of fit • Many measures of goodness of fit • Profiles of probability for all levels of the predictor • Odds ratios • ROC curve • Lift curve • Confusion matrix
Advantage of Using Binary GLM • JMP provides the following when you use a GLM for a binary response: • Deviance for lack of fit • Over-dispersion model parameter • Likelihood ratio test for over-dispersion • Four residual plots • Prediction profiler for probability of target level
GLM for a Binary Response • A binary response can also be modeled with a GLM. • The canonical link function is the logit.
Separation Problem • It might happen in any given sample that the binary outcomes are completely separated by the explanatory variable. • This separation causes a problem with estimating the logistic regression or GLM parameters. • Firth’s penalized maximum likelihood estimation method can avoid this problem and reduce bias in the parameter estimates in the case of rare outcomes.
Pearson Residuals • Pearson chi-square for goodness of fit is the sum of the squared Pearson residuals.
Deviance Residuals • The deviance chi-square for goodness of fit is the sum of the squared deviance residuals. • Studentized residuals provide a common scale for inspection.
3.03 Quiz • What are the three GLM components for a binary response?
3.03 Quiz – Correct Answer • What are the three GLM components for a binary response? • Random component is the binomial distribution. • Systematic component is a polynomial function. • Link component is the logit function.
GLM for Binary Response Example • Use GLM with the Titanic Passengers data set to related Survived with Siblings and Spouses, Parents and Children, and Fare.
GLM for a Binary Response This demonstration illustrates the concepts discussed previously.
Exercise This exercise reinforces the concepts discussed previously.
Objectives • Identify categorical response of counts. • Use a GLM that is also known as Poisson loglinear regression.
Response Is Counts • The response can be simply the count of a particular event in many cases. • Occurrence of a disease • Road accidents • Mold colonies • Number of non-conforming items
Response Is Counts, Constant Opportunity • The response can be the count of a particular event in the same span of time, linear dimension, area, or volume. • Occurrence of a disease per annum • Road accidents each month on the same highway • Mold colonies in a standard Petri dish • Number of non-conforming items in a standard lot size
Poisson Regression • A simple model of counts is the Poisson distribution. • The canonical link function is the log.
Response Is Counts, Opportunity Varies • The response can be simply the count of a particular event in the same span of time or linear dimension, area, or volume. • Occurrence of a disease in different hospitals • Road accidents on different highways • Mold colonies in nonstandard field cases • Number of non-conforming items in lots of different sizes • Requires the use of an offset parameter in the model. • Acts like intercept in the linear model.
Poisson Loglinear Model with Offset • The opportunity for the counts might not be constant for all observations. • The opportunity N might be a period of time, a length, an area, or a volume. • Log(Ni) is the offset.
3.04 Multiple Answer Poll • How is the logarithm always used in Poisson regression with GLM? • Transform the response variable • Transform the explanatory variable • Transform the offset variable • Link the systematic and random components • Increase the over-dispersion