一般化線形モデル（ GLM ） generalized linear Models

一般化線形モデル（GLM）generalizedlinearModels データ解析のための統計モデリング入門久保拓也（２０１２）　岩波書店

GeneralizedLinearModels • LinearModel response variable ～　intercept + slope * explanatory variable • lm(y~ x + f ・・・)，lm(y~x + f -1) (no intercept) require(graphics) ## Annette Dobson (1990) "An Introduction to Generalized Linear Models". ## Page 9: Plant Weight Data. ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <- gl(2,10,20, labels=c("Ctl","Trt")) weight <- c(ctl, trt) lm.D9 <- lm(weight ~ group) lm.D90 <- lm(weight ~ group - 1) # omitting intercept anova(lm.D9) summary(lm.D90) opar <- par(mfrow = c(2,2), oma = c(0, 0, 1.1, 0)) plot(lm.D9, las = 1) # Residuals, Fitted, ... Par(opar) ### less simple examples in "See Also" above

GeneralizedLinearModels • LinearModel response variable ～　intercept + slope * explanatory variable • lm(y~ x + f ・・・)，lm(y~x + f -1) (no intercept) • Generalized Linear Model Model &Link function ～　intercept + slope * explanatory variable • glm(y ~ x, data = d, family = poisson)

PoissonModel（counting data of occurrence) • PoissonModel • λ：meanoccurrence in unit time • Identity link • Log link(canonical) • 正準リンク関数：最も自然なリンク関数：乗法効果） Link function ～　intercept + slope * explanatory variable • glm(y ~ x, data = d, family = poisson(link=“log”)) • Canonical link function is set as default

PoissonModel (p49)（counting data of occurrence) • Poisson Model for number of seeds of a plant, regressed on plant size and nutrification (p49) • Maximize log-likelihood glm(y ~ x + f, data = d, family = poisson) # model p58 fit.all <- glm(y ~ x + f, data=d, family=poisson) print(fit.all) logLik(fit.all) plot(d$x, d$y, pch =c(21, 19)[d$f]) xx <- seq(min(d$x), max(d$x), length =100) lines(xx,exp(1.263 + 0.0801 * xx), lwd=2) #page 42 plant data d <- read.csv("data3a.csv") d$y # number of seeds d$x # plant size (hight) d$f # nutrification (treat-control) plot(d$x, d$y, pch =c(21, 19)[d$f])

PoissonModel (p49)（counting data of occurrence) • Poisson Model for number of seeds of a plant, regressed on plant size and nutrification (p49) • Maximize log-likelihood # model p58 fit.all <- glm(y ~ x + f, data=d, family=poisson) print(fit.all) logLik(fit.all) plot(d$x, d$y, pch =c(21, 19)[d$f]) xx <- seq(min(d$x), max(d$x), length =100) lines(xx,exp(1.263 + 0.0801 * xx), lwd=2) #page 42 plant data d <- read.csv("data3a.csv") d$y # number of seeds d$x # plant size (hight) d$f # nutrification (treat-control) plot(d$x, d$y, pch =c(21, 19)[d$f])

Other Generalized Linear Models(chap6 p114)

GeneralizedLinearModels • Generalized Linear Model glm(y ~ x, data = d, family = poisson) • Family （Modelled Probability Distribution) • binomial(link = “logit“) 2項分布（規定試行中の発生数） • gaussian(link = “identity”)　正規分布 • Gamma(link = “inverse”)　　ガンマ分布（正のみ） • inverse.gaussian(link = “1/mu^2”)　逆ガウス分布 • poisson(link = “log”)　ポアソン分布（一定時間中の発生回数） • quasi(link = “identity”, variance = “constant”)正規分布（不均一） • quasibinomial(link = “logit”)2項分布（分散不均一） • quasipoisson(link = “log”) 　　ポアソン分布（分散不均一）

Binomial Logistic Model (p118)（occurrence number in given trials) • Binomial Model for the number of survived plant in 8 obserbations, regressed on plant size and nutrification (p118) • Maximize log-likelihood glm(cbind(y,N-y) ~ x + f, data = d, family = binomial) #page 117 plant data d <- read.csv("data4a.csv") d$N # number of trials d$y # number of survived plant d$x # plant size d$f # nutrification (treat-control) plot(d$x, d$y, pch =c(21, 19)[d$f]) # model p122 fit.all <- glm(cbind(y, N-y) ~ x + f, data=d, family=binomial) print(fit.all) logLik(fit.all)

Offset Term(p131)（avoid a division calculation) • Count data for several zones having different area, or different population • One way is define a density (occurrence in unit area) and apply Poisson model glm(y ~ x, offset =log(A), data = d, family = poisson) #page 133 plant data d <- read.csv("data4b.csv") d$y # number of plants in lot i d$x # brightness at lot I d$A # area of lot i plot(d$A, d$y) # model p131 fit<- glm(y ~ x, offset = log(A) , data=d, family=poisson) print(fit) logLik(fit)

Gamma Distribution Model (p138) • Gamma Distribution (continuous positive data) s: shape parameter, r: rate parameter, theta=1/r: scale parameter • time length before s times occurrence of random events with occurrence rate of r. (average occurrence interval is ) • Average : Variance: • dgamma(y, shape, rate) • Weight of flower of a plant y (continuous, positive) • average weight • Loglink function of linear estimator • glm(y ~ log(x), data = d, family = gamma(link="log"))

Gamma Distribution Model (p138) • Gamma Distribution (continuous positive data) • glm(y ~ log(x), data = d, family = gamma(link="log") # A Gamma example, from McCullagh & Nelder (1989, pp. 300-2) clotting <- data.frame( u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12)) summary(glm(lot1 ~ log(u), data=clotting, family=Gamma)) summary(glm(lot2 ~ log(u), data=clotting, family=Gamma)) Call:glm(formula = lot1 ~ log(u), family = Gamma, data = clotting) Deviance Residuals: Min 1Q Median 3Q Max -0.04008 -0.03756 -0.02637 0.02905 0.08641 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.0165544 0.0009275 -17.85 4.28e-07 *** log(u) 0.0153431 0.0004150 36.98 2.75e-09 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for Gamma family taken to be 0.002446059) Null deviance: 3.51283 on 8 degrees of freedom Residual deviance: 0.01673 on 7 degrees of freedom AIC: 37.99

一般化線形モデル（ GLM ） generalized linear Models

一般化線形モデル（ GLM ） generalized linear Models

Presentation Transcript

Chapter 13

LINEAR PROGRAMMING

Introduction to Generalized Linear Models

limma: Linear Models for Microarray Data

Linear Algebra for Communications : A gentle introduction

Managerial Decision Modeling with Spreadsheets

Chapter 3: Systems of Linear Equations

Minimizing Communication in Linear Algebra

CHAPTER 2: Kinematics of Linear Motion (5 hours)

Linear Mixed Models: An Introduction

The Cartesian Coordinate System Straight Lines Linear Functions and Mathematical Models

5.1 Linear Functions

Statistical Models in S

Equations: Linear and Systems

Systems of Linear Differential Equations

XI. Estimation of linear model parameters

Higher-Order Differential Equations

The General Linear Model

Applications

Electronic Business Models