1 / 27

Generalized Linear Models

Generalized Linear Models. All the regression models treated so far have common structure. This structure can be split up into two parts: The random part: The systematic part: These two elements are the basic building blocks of generalized linear models. The systematic part.

danil
Download Presentation

Generalized Linear Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generalized Linear Models • All the regression models treated so far have common structure. This structure can be split up into two parts: • The random part: • The systematic part: • These two elements are the basic building blocks of generalized linear models.

  2. The systematic part • Generalized linear model, systematic part: • The covariates influence the distribution of response through the linear predictor: • There is a link-function that links the expectation to the linear predictor:

  3. The generalization from linear models to GLM • GLMs are a generalization of linear normal models in two directions:

  4. Example: binomial distribution • Definition: the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.

  5. Example • For the binomial distribution • The variance is a function of the mean: • The linear model for the logit: ____________________ is a non-linear model for the probability ___________________.

  6. The exponential family • Many distributions encountered in practice (ex: normal, binomial, Poisson and Gamma distribution) share a common structure:

  7. Example of the exponential family: Normal distribution

  8. Example of the exponential family: Binomial

  9. Example of the exponential family • The Poisson distribution: It is a discrete probability distribution that expresses the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate and independently to the time. • Ex: • The number of phone calls received by a telephone operator in a 10-minute period. • The number of typos per page made by a secretary.

  10. Poisson distribution • The Poisson distribution belongs to the exponential family:

  11. Mean and variance in the exponential family • It can be shown that the mean and variance in the exponential family is:

  12. Mean and variance example: Poisson • For the Poisson model, mean and variance are: • To summarize, for any given distribution we obtain a specific form of b which in turn determines the variance function. • The converse is also true: • Hence specifying a distribution and a variance function is two sides of the same coin as long as we work with exponential families.

  13. Various variance functions

  14. The link function • The link function is a function which relates the mean to the linear predictor: • Various link functions have been illustrated so far:

  15. Canonical link • For each distribution there is a specific link function which yields “nice” mathematical and numerical properties in connection with the estimation process. This link function is called the canonical link:

  16. Specification of GLM • In practice, a GLM is specified by three steps: • In this connection it is important to be aware of the following: Most statistical packages will by default use the canonical link function unless another one is explicitly provided.

  17. R code • The glm function in R is used for fitting generalized linear models. • Specification of the linear predictor: • Specification of the distribution and the link function: e.g. family=Gamma(link=log)

  18. Remember that the specification of a distribution yields a specific variance function. Not all possible combinations of a distribution and a link function are allowed in R.

  19. Special aspects for binomial data • Simulate artificial Bernoulli observations with different event probabilities for two groups (the number of trails N is equal to 1): R code group <- rep(c("A", "B"), c(30, 45)) logit.pi <- ifelse(group == "B", 0.7, 0.7 + 0.5) group <- factor(group) pi <- plogis(logit.pi) N <- rep(1, length(group)) events <- rbinom(length(group), size = N, prob = pi) dat <- data.frame(group, N, events)

  20. Analysis of simulated data • Model: ___________________________________ • The response is a two-column matrix containing events and non-events: f1<-glm(cbind(events,N-events)~group, family=binomial,data=dat) • Define proportions: dat$prop<-with(dat, events/N) and use these as the response and the number of trails N as weights in the fit: f2<-glm(prop~group, family=binomial, weights=N, data=dat) • Use the number of events directly as the response f3<-glm(events~group,family=binomial,data=dat)

  21. Fitting GLMs– logistic regression • Consider a data set where the response variable takes only 0 or 1 values and the single covariate variable is continues numerical type. Examples • If we apply a simple linear regression model_____ to fit the data, there are some problems. • Conclusion: it is not appropriate to use the simple linear regression to model regression data with binary responses.

  22. Logistic regression • Solution is to use the logistic function: • The formal definition of logistic model for binary response with p variable:

  23. Logistic regression • How to interpret the model? • In logistic model, the odds of “success”: • The logistic model for binary data can be slightly modified

  24. Modified to cover binomial data

  25. Bernoulli and Poisson distribution • Likelihood: • MLE estimates:

  26. Parameter estimation in GLMs

  27. IWLS Algorithm • Iterative weighted least square algorithm:

More Related