330 likes | 469 Views
Two Distribution Families for Modelling Over- and Underdispersed Binomial Frequencies Feirer V. , Hirn U., Friedl H., Bauer W. Institute for Paper, Pulp and Fiber Technology & Institute for Statistics Graz University of Technology. Agenda. Motivation Generalized Linear Models
E N D
Two Distribution Familiesfor Modelling Over- and UnderdispersedBinomial FrequenciesFeirer V., Hirn U., Friedl H., Bauer W.Institute for Paper, Pulp and Fiber Technology& Institute for StatisticsGraz University of Technology
Agenda • Motivation • Generalized Linear Models • Multiplicative Binomial Distribution • Double Binomial Distribution • Application of the Two Distributions • Summary
Motivation • consider the problem of successful ink transfer on paper (No. of datapoints in sample: roughly 9106 sample size: 3 6 mm²) • explain occurrence of unprinted regions …part of a larger, industry-funded project at the IPZ.
Predictor Variables Topography Formation …the way fibres are arranged
Response true colour image
Basics generalized linear models
Distribution of the Response response …part of the Exponential Family here with the probability for successful ink transmission model for
the Generalized Linear Model* model for linear predictor is linked to the mean by • advances over a linear model: • distribution of the relative frequencies • … member of the Exponential Family • mean lies between 0 and 1 * Nelder & Wedderburn (1972). Generalized Linear Models. Journal of the Royal Statistical Society, 135, 370-384
Model Deviance …a test for goodness-of-fit Deviance = -2 × ( maximized log-likelihood of considered model – maximized log-likelihood of saturated model ) under certain regularity conditions, if Underdispersion Variance of data smaller than assumed by the model if Overdispersion Variance of data larger than assumed by the model
Deviances of the Printability Datasets …values from 11 different data sets distinct deviations from a binomial variance! many few unprinted areas
A Generalization of the Binomial Distribution Multiplicative binomial distribution
Definition • introduced by Altham* as „multiplicative generalization of the binomial distribution“ considers litters of rabbits animals within one litter are treated with the same dosis of a certain drug n… litter size y… number of surviving animals • outcomes from animals from within one litter are not mutually independent Altham introduces an interaction parameter ω *Altham (1978). Two Generalizations of the Binomial Distribution. Journal of the Royal Statistical Society, 27, 162-197
Properties • Member of the 2-parameter Exponential Family • For ω=1, it corresponds to the Binomial Distribution • For n=1, it reduces to the Bernoulli distribution
Comparison With Classic Binomial pdf n = 36 = 0.8 ω=1 gives the classic binomial distribution
Comparison of the Variances n = 36 ω=1 gives the classic binomial distribution
Integration into GLM Context log-likelihood function of distribution log-linear link logit-link ω > 0 0 < < 1
A Second Generalization of the Binomial Distribution Double binomial Distribution
Definition introduced by Efron* as part of the Double Exponential Family second parameter allows variation of variance: variance is smaller than binomial if 0<<1 and larger than binomial if >1 =1 gives the classic binomial distribution *Efron (1986). Double Exponential Families and their Use in Generalized Linear Regression. Journal of the American Statistical Association, 81, 709-721
Comparison With Classic Binomial pdf n = 36 = 0.8 =1 gives the classic binomial distribution
Comparison of the Variances n = 36 =1 gives the classic binomial distribution
Integration into GLM Context member of the 2-parameter exponential family log-likelihood function of distribution log-linear link logit-link > 0 0 < < 1
The Printability Dataset An application
Response and Explanatory Variables ~ explained by… + formation topography occurrrence of unprinted areas…
Comparison of the Means The second parameter influences the mean, too.
Comparison of the Variances binomial Std. Dev. at n=36: cannot be larger than 3 empirical Std. Deviations: up to 11 Multiplicative and Double Binomial Standard Deviations fit much better to empirical results
Summary Two generalizations of the binomial distribution might compensate over- or underdispersion in the case of classic binomial distribution. Multiplicative Binomial Distribution (Altham, 1978) second parameter ω in GLM context: model with the logistic link and ω with the log-linear link function
Summary 2 Double Binomial Distribution (Efron, 1986) second parameter in GLM context: model with the logistic link and with the log-linear link function