360 likes | 567 Views
PROC GLIMMIX: AN OVERVIEW . By William E. Jackman. PROC GLIMMIX: AN OVERVIEW. A new SAS/STAT Product Experimental in SAS 9.1 Production in SAS 9.2. %GLIMMIX macro Combines and extends statistical features found in other SAS procedures
E N D
PROC GLIMMIX: AN OVERVIEW By William E. Jackman
PROC GLIMMIX: AN OVERVIEW • A new SAS/STAT Product • Experimental in SAS 9.1 • Production in SAS 9.2. • %GLIMMIX macro • Combines and extends statistical features found in other SAS procedures • Part of a succession of SAS procedures which have extended the General Linear Model (GLM)
PROC GLIMMIX: AN OVERVIEW • Regression Analysis Basics • Y = B0 + B1 X1 +B2 X2 ... + Bn Xn + e • y = Xβ + ε (matrix notation) • ε ~ N(0, α2 In) • Estimation by ordinary least squares (OLS). • Essence of the General Linear Model (GLM) • Y's and the X's go by several names • Covariates
PROC GLIMMIX: AN OVERVIEW • The GLM underlies PROC REG and PROC GLM • Both procedures use OLS to fit the GLM to data with continuous response variable • Same assumptions about residuals • PROC REG has advantages for continuous effects (regressors). • PROC GLM has advantages for discrete effects (regressors).
PROC GLIMMIX: AN OVERVIEW • Indicator (dummy) variables and interactions * PROC REG: must be created in data step * PROC GLM: use class & model statements • Which Procedure to use? * Interested primarily in effect of continuous variables (covariates)? * Interested primarily in effect of grouping variables?
PROC GLIMMIX: AN OVERVIEW • The generalized linear model (GzLM) extends (or generalizes) the GLM. • Presented in 1972; expanded in 1989. • Non-normal data from exponential family • Linearity is achieved through the link function. • Implemented, for example, in PROC GENMOD • PROC GENMOD can also handle correlated residuals.
PROC GLIMMIX: AN OVERVIEW • General form of the GENMOD procedure • PROC GENMOD options ; • CLASS variables ; • MODEL response=effects / dist= link= options ; • REPEATED SUBJECT=subjects-effects / options ; • RUN ;
PROC GLIMMIX: AN OVERVIEW Example of the GENMOD procedure for Poisson regression proc genmod data=skin ; class city age ; model cases=city age / offset=log_pop dist=poi link=log ; run ; where log_pop = log of the population
PROC GLIMMIX: AN OVERVIEW The generalized linear model (GzLM) • Canonical link functions most common. • Obtained from probability density function • Default in PROC GENMOD • For the Poisson distribution the default link function is the log of the response variable. • log(μ) = Xβ • Inverse link functions • μ = eη
PROC GLIMMIX: AN OVERVIEW Logistic Regression: A special case of the generalized linear model (GzLM) • Response variable from binomial distribution • Part of the exponential family so GzLM applies • Link function is the logit. • logit(pi) = ln(pi / (1-pi)) • Can be done with PROC GENMOD • Input from David Schlotzhauer of SAS Institute
PROC GLIMMIX: AN OVERVIEW FURTHER EXTENSIONS OF THE GLM • GLM and GzLM cannot handle random effects. • Fixed effects-interest only in levels specified • Random effects-inference to other levels • PROC GENMOD and PROC LOGISTIC cannot handle random effects.
PROC GLIMMIX: AN OVERVIEW PROC MIXED: An extension of the GLM • Can handle random effects and correlated errors • fixed effects only model • y = Xβ + ε • mixed model • y = Xβ + Zγ + ε
PROC GLIMMIX: AN OVERVIEW Mixed models distinguish between G-side random effects and R-side random effects. • G-side random effects correspond to covariates (regressors) in the model which are random. • R-side random effects correspond to the residuals in the model.
PROC GLIMMIX: AN OVERVIEW Example of PROC MIXED syntax proc mixed ; class id time gender ; model z = gender age gender*age ; random intercept / subject=id ; *** G-side effects go here. ; repeated time /subject=id type=ar(1) ; *** R-side effects go here. ; run ;
PROC GLIMMIX: AN OVERVIEW PROC MIXED: a linear mixed model (LMM) • PROC MIXED allows for random intercepts for each subject. • models the correlation in the repeated measures within each subject. • has rich variety of covariance matrices for dealing with correlated residuals. • Unlike GzLM’s, LMM’s require a normally distributed response variable.
PROC GLIMMIX: AN OVERVIEW • PROC GLIMMIX - PUTTING IT ALL TOGETHER • A Generalized Linear Mixed Model (GzLMM) • Combines and extends features of GzLM’s and LMM’s • Enables modeling random effects and correlated errors for non-normal data
PROC GLIMMIX: AN OVERVIEW The Generalized Linear Mixed Model (GzLMM) • A linear predictor can contain random effects: η = Xβ + Z γ • The random effects are normally distributed • The conditional mean, μ|γ, relates to the linear predictor through a link function: g(μ|γ) = η • The conditional distribution (given γ) of the data belongs to the exponential family of distributions.
PROC GLIMMIX: AN OVERVIEW Other new features of PROC GLIMMIX include: • low-rank smoothing based on mixed models • new features for LS-means comparisons and display. • SAS programming statements allowed within the procedure • Fits models to multivariate data with different distributions or links
PROC GLIMMIX: AN OVERVIEW General form of the GLIMMIX procedure: • PROC GLIMMIX options ; • programming statements ; • CLASS variables ; • MODEL response=fixed-effects / DIST= LINK = options ; • RANDOMrandom-effects / options ; • RANDOM _RESIDUAL_ / options ; • RUN ;
PROC GLIMMIX: AN OVERVIEW Like other mixed models, PROC GLIMMIX distinguishes between G-side random effects and R-side random effects. • G-side random effects correspond to covariates in the model which are random. • R-side random effects correspond to the residuals in the model.
PROC GLIMMIX: AN OVERVIEW Example of a GzLMM using PROC GLIMMIX for Logistic Regression with Random Effects • proc glimmix data=example ; • class trt clinic ; • model y=trt / dist=binomial link=logit ; • random clinic trt*clinic ; • *** random intercept trt / subject=clinic ; • run ;
PROC GLIMMIX: AN OVERVIEW • This example cannot be handled by PROC LOGISTIC since clinic is a random effect. • For logistic regression with fixed effect only, PROC GLIMMIX or PROC LOGISTIC can be used. Which should you use? • More input from David Schlotzhauer of the SAS Institute.
PROC GLIMMIX: AN OVERVIEW Parameters Estimation Methods in PROC GLIMMIX • The GLIMMIX procedure has two basic modes of parameter estimation: GLM-mode and GLMM-mode. • In GLM-mode, the data is never correlated and there can be no G-side random effect. • In the GLMM-mode, there might be random effects and/or correlated data.
PROC GLIMMIX: AN OVERVIEW Parameter Estimation for generalized linear models • Normal distribution: restricted maximum likelihood • All other known distributions: maximum likelihood • Unknown distributions: quasi-likelihood
PROC GLIMMIX: AN OVERVIEW Parameter Estimation for generalized linear models with overdispersion • Parameters are estimated using maximum likelihood • An overdispersion parameter can be estimated from the Pearson statistic
PROC GLIMMIX: AN OVERVIEW Parameter Estimation for generalized linear mixed models • Pseudo-likelihood
PROC GLIMMIX: AN OVERVIEW Using PROC GLIMMIX for Linear Mixed Models • In this example, the response variable is normally-distributed. • Proc glimmix data= grass ; • Class method variety ; • Model yield = method / dist=normal ; • Random variety method*variety ; • run ; • PROC GLIMMIX uses the residual/restricted maximum likelihood as does PROC MIXED.
PROC GLIMMIX: AN OVERVIEW • PROC GLIMMIX can do much of what PROC LOGISTIC, PROC MIXED, PROC REG, and PROC GLM can do. • Could be viewed as a “super PROC” • Input from Jill Tao of the SAS Institute
PROC GLIMMIX: AN OVERVIEW PROC GLIMMIX versus PROC MIXED Closely related but important differences • PROC GLIMMIX is not PROC MIXED with a LINK= and a DIST= option. • PROC GLIMMIX models non-normal data. PROC MIXED does not. • PROC GLIMMIX allows programming statements. PROC MIXED does not. • PROC GLIMMIX uses the RANDOM statement to model R-side random effects. PROC MIXED uses the REPEATED statement to model R-side random effects. • PROC GLIMMIX does not support the Kronecker and heterogeneous covariance structures as supported by PROC MIXED.
PROC GLIMMIX: AN OVERVIEW PROC GLIMMIX versus PROC GENMOD PROC GLIMMIX • fits unit-specific models with the G-side random effects • fits population-average models without the G-side effects. (Without the G-side effects, there is no way to condition the response and make the estimates unit-specific.) • provides sandwich estimators of covariance of fixed effects through the EMPIRICAL option when the model is processed by subjects. • computes the parameter estimates by a pseudo-likelihood method.
PROC GLIMMIX: AN OVERVIEW PROC GLIMMIX versus PROC GENMOD PROC GENMOD • cannot accommodate random effects • fits only population-average models • computes the parameter estimates by a moment-based method.
PROC GLIMMIX: AN OVERVIEW Applications Using the GLIMMIX Procedure(from "Statistical Analysis with the GLIMMIX Procedure") • Poisson Regression with Random Effects • An example of Beta Regression • Repeated Measures Data with Discrete Response • Introduction to Radial Smoothing Applications are explained in detail in the SAS course.
PROC GLIMMIX: AN OVERVIEW Fitting Models To Multivariate Data In Which Observations Do Not All Have The Same Distribution Or Link • EXAMPLE: JOINT MODELS FOR BINARY AND POISSON DATA (from a paper by Oliver Schabenberger of the SAS Institute)
PROC GLIMMIX: AN OVERVIEW data joint; length dist $7; input d$ patient age OKstatus response @@; if d = ’B’ then dist=’Binary’; else dist=’Poisson’; datalines; (only 3 lines shown) B 1 78 1 0 P 1 78 1 9 B 2 60 1 0 P 2 60 1 4 B 3 68 1 1 P 3 68 1 7 B 4 62 0 1 P 4 62 0 35 B 5 76 0 0 P 5 76 0 9 B 6 76 1 1 P 6 76 1 7
PROC GLIMMIX: AN OVERVIEW proc glimmix data=joint; class patient dist; model response(event=’1’) = dist dist*age dist*OKstatus / noint s dist=byobs(dist); random int / subject=patient; run;
PROC GLIMMIX: AN OVERVIEW • The previous slide showed modeling correlations through G-side random effects. It could also be done through R-side random effects. This is presented in the SAS course “Statistical Analysis with the GLIMMIX Procedure” which expands upon this example.