120 likes | 302 Views
一般化線形モデル( GLM ) generalized linear Models. データ解析のための統計モデリング入門 久保拓也(2012) 岩波書店. Generalized Linear Models. Linear Model response variable ~ intercept + slope * explanatory variable lm(y~ x + f ・・・ ) , lm(y~x + f -1) (no intercept). require(graphics)
E N D
一般化線形モデル(GLM)generalizedlinearModels データ解析のための統計モデリング入門 久保拓也(2012) 岩波書店
GeneralizedLinearModels • LinearModel response variable ~ intercept + slope * explanatory variable • lm(y~ x + f ・・・),lm(y~x + f -1) (no intercept) require(graphics) ## Annette Dobson (1990) "An Introduction to Generalized Linear Models". ## Page 9: Plant Weight Data. ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <- gl(2,10,20, labels=c("Ctl","Trt")) weight <- c(ctl, trt) lm.D9 <- lm(weight ~ group) lm.D90 <- lm(weight ~ group - 1) # omitting intercept anova(lm.D9) summary(lm.D90) opar <- par(mfrow = c(2,2), oma = c(0, 0, 1.1, 0)) plot(lm.D9, las = 1) # Residuals, Fitted, ... Par(opar) ### less simple examples in "See Also" above
GeneralizedLinearModels • LinearModel response variable ~ intercept + slope * explanatory variable • lm(y~ x + f ・・・),lm(y~x + f -1) (no intercept) • Generalized Linear Model Model &Link function ~ intercept + slope * explanatory variable • glm(y ~ x, data = d, family = poisson)
PoissonModel(counting data of occurrence) • PoissonModel • λ:meanoccurrence in unit time • Identity link • Log link(canonical) • 正準リンク関数:最も自然なリンク関数:乗法効果) Link function ~ intercept + slope * explanatory variable • glm(y ~ x, data = d, family = poisson(link=“log”)) • Canonical link function is set as default
PoissonModel (p49)(counting data of occurrence) • Poisson Model for number of seeds of a plant, regressed on plant size and nutrification (p49) • Maximize log-likelihood glm(y ~ x + f, data = d, family = poisson) # model p58 fit.all <- glm(y ~ x + f, data=d, family=poisson) print(fit.all) logLik(fit.all) plot(d$x, d$y, pch =c(21, 19)[d$f]) xx <- seq(min(d$x), max(d$x), length =100) lines(xx,exp(1.263 + 0.0801 * xx), lwd=2) #page 42 plant data d <- read.csv("data3a.csv") d$y # number of seeds d$x # plant size (hight) d$f # nutrification (treat-control) plot(d$x, d$y, pch =c(21, 19)[d$f])
PoissonModel (p49)(counting data of occurrence) • Poisson Model for number of seeds of a plant, regressed on plant size and nutrification (p49) • Maximize log-likelihood # model p58 fit.all <- glm(y ~ x + f, data=d, family=poisson) print(fit.all) logLik(fit.all) plot(d$x, d$y, pch =c(21, 19)[d$f]) xx <- seq(min(d$x), max(d$x), length =100) lines(xx,exp(1.263 + 0.0801 * xx), lwd=2) #page 42 plant data d <- read.csv("data3a.csv") d$y # number of seeds d$x # plant size (hight) d$f # nutrification (treat-control) plot(d$x, d$y, pch =c(21, 19)[d$f])
GeneralizedLinearModels • Generalized Linear Model glm(y ~ x, data = d, family = poisson) • Family (Modelled Probability Distribution) • binomial(link = “logit“) 2項分布(規定試行中の発生数) • gaussian(link = “identity”) 正規分布 • Gamma(link = “inverse”) ガンマ分布(正のみ) • inverse.gaussian(link = “1/mu^2”) 逆ガウス分布 • poisson(link = “log”) ポアソン分布(一定時間中の発生回数) • quasi(link = “identity”, variance = “constant”)正規分布(不均一) • quasibinomial(link = “logit”)2項分布(分散不均一) • quasipoisson(link = “log”) ポアソン分布(分散不均一)
Binomial Logistic Model (p118)(occurrence number in given trials) • Binomial Model for the number of survived plant in 8 obserbations, regressed on plant size and nutrification (p118) • Maximize log-likelihood glm(cbind(y,N-y) ~ x + f, data = d, family = binomial) #page 117 plant data d <- read.csv("data4a.csv") d$N # number of trials d$y # number of survived plant d$x # plant size d$f # nutrification (treat-control) plot(d$x, d$y, pch =c(21, 19)[d$f]) # model p122 fit.all <- glm(cbind(y, N-y) ~ x + f, data=d, family=binomial) print(fit.all) logLik(fit.all)
Offset Term(p131)(avoid a division calculation) • Count data for several zones having different area, or different population • One way is define a density (occurrence in unit area) and apply Poisson model glm(y ~ x, offset =log(A), data = d, family = poisson) #page 133 plant data d <- read.csv("data4b.csv") d$y # number of plants in lot i d$x # brightness at lot I d$A # area of lot i plot(d$A, d$y) # model p131 fit<- glm(y ~ x, offset = log(A) , data=d, family=poisson) print(fit) logLik(fit)
Gamma Distribution Model (p138) • Gamma Distribution (continuous positive data) s: shape parameter, r: rate parameter, theta=1/r: scale parameter • time length before s times occurrence of random events with occurrence rate of r. (average occurrence interval is ) • Average : Variance: • dgamma(y, shape, rate) • Weight of flower of a plant y (continuous, positive) • average weight • Loglink function of linear estimator • glm(y ~ log(x), data = d, family = gamma(link="log"))
Gamma Distribution Model (p138) • Gamma Distribution (continuous positive data) • glm(y ~ log(x), data = d, family = gamma(link="log") # A Gamma example, from McCullagh & Nelder (1989, pp. 300-2) clotting <- data.frame( u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12)) summary(glm(lot1 ~ log(u), data=clotting, family=Gamma)) summary(glm(lot2 ~ log(u), data=clotting, family=Gamma)) Call:glm(formula = lot1 ~ log(u), family = Gamma, data = clotting) Deviance Residuals: Min 1Q Median 3Q Max -0.04008 -0.03756 -0.02637 0.02905 0.08641 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.0165544 0.0009275 -17.85 4.28e-07 *** log(u) 0.0153431 0.0004150 36.98 2.75e-09 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for Gamma family taken to be 0.002446059) Null deviance: 3.51283 on 8 degrees of freedom Residual deviance: 0.01673 on 7 degrees of freedom AIC: 37.99