480 likes | 645 Views
SAS ® Global Forum 2014. Got Randomness? SAS TM for Mixed and Generalized Linear Mixed Models. March 23-26 Washington, DC. David A. Dickey NC State University. TM SAS and its products are the registered trademarks of SAS Institute, Cary, NC. Data: Challenger (Binomial with
E N D
SAS® Global Forum 2014 Got Randomness? SASTM for Mixed and Generalized Linear Mixed Models March 23-26 Washington, DC David A. Dickey NC State University TM SAS and its products are the registered trademarks of SAS Institute, Cary, NC
Data: Challenger (Binomial with random effects) Data: Ships (Poisson with offset) Data: Crab mating patterns (X rated) (Poisson Regression, ZIP model, Negative Binomial) Data: Typists (Poisson with random effects) Data: Flu samples (Binomial with random effects)
“Generalized” non normal distribution Binary for probabilities: Y=0 or 1 Mean E{Y}=p Variance p(1-p) Pr{Y=j}= pj(1-p)(1-j) Link: L=ln(p/(1-p)) = “Logit” Range (over all L): 0<p<1 Poisson for counts: Y in {0,1,2,3,4, ….} Mean count l Variance l Pr{Y=j} = exp(- l )(lj)/(j!) Link: L = log(l) Range (over all L): l >0 2/3 1/3 Like Dislike Pr{ Like }=2/3 l=0.4055 2/3 .055 .27 .007 .001 0 1 2 3 4 5
Mixed (not generalized) Models: Fixed Effects and Random Effects
Fixed? or Random? Replication : Same levels Different levels Inference for:Only ObservedPopulation of LevelsLevels Levels : Picked onPicked at Purpose Random Inference on:MeansVariances Example: Only These All Doctors Drugs All Clinics Example: Only These All Farms Fertilizers All Fields
Generalized (not mixed) linear models. Use link L = g(E{Y}), e.g. ln(p/(1-p)) =ln(E{Y}/(1-E{Y}) Assume L is linear model in the inputs with fixed effects. Estimate model for L, e.g. L=g(E{Y})=bo + b1 X Use maximum likelihood Example: L = -1 + .18*dose Dose = 10, L=0.8, p=exp(0.8)/(1+exp(0.8))= “inverse link” = 0.86
Challenger was mission 24 From 23 previous launches we have: 6 O-rings per mission Y=0 no damage, Y=1 erosion or blowby p = Pr {Y=1} = f{mission, launch temperature) Features: Random mission effects Logistic link for p procglimmix data=O_ring; class mission; model fail = temp/dist=binomial s; random mission; run; • Generalized • Mixed
Estimated G matrix is not positive definite. Covariance Parameter Estimates Cov Standard Parm Estimate Error mission 2.25E-18 . Solutions for Fixed Effects Effect Estimate Error DF t Value Pr > |t| Intercept 5.0850 3.0525 21 1.67 0.1106 temp -0.1156 0.04702 115 -2.46 0.0154 We “hit the boundary”
Flu Data CDC Active Flu Virus Weekly Data % positive data FLU; input fluseasn year t week pos specimens; pct_pos=100*pos/specimens; logit=log(pct_pos/100/(1+(pct_pos/100))); label pos = "# positive specimens"; label pct_pos="% positive specimens"; label t = "Week into flu season (first = week 40)"; label week = "Actual week of year"; label fluseasn = "Year flu season started"; Empirical Logit % positive
(1) GLM all effects fixed (harmonic main effects insignificant) “Sinusoids” S(j) = sin(2pjt/52) C(j)=cos(2pjt/52) PROCGLM DATA=FLU; class fluseasn; model logit = s1 c1 fluseasn*s1 fluseasn*c1 fluseasn*s2 fluseasn*c2 fluseasn*s3 fluseasn*c3 fluseasn*s4 fluseasn*c4; output out=out1 p=p; data out1; set out1; P_hat = exp(p)/(1+exp(p)); label P_hat = "Pr{pos. sample} (est.)"; run;
(2) MIXED analysis on logits Random harmonics. Normality assumed Toeplitz(1) PROCMIXED DATA=FLU method=ml; ** reduced model; class fluseasn; model logit = s1 c1 /outp=outpoutpm=outpmddfm=kr; random intercept/subject=fluseasn; random s1 c1/subject=fluseasn type=toep(1); random s2 c2/subject=fluseasn type=toep(1); random s3 c3/subject=fluseasn type=toep(1); random s4 c4/subject=fluseasn type=toep(1); run;
(3) GLIMMIX analysis Random harmonics. Binomial assumed (overdispersed– lab effects?) PROCGLIMMIX DATA=FLU; title2 "GLIMMIX Analysis"; class fluseasn; model pos/specimens = s1 c1 ; * s2 c2 s3 c3 s4 c4; random intercept/subject=fluseasn; random s1 c1/subject=fluseasn type=toep(1); random s2 c2/subject=fluseasn; ** Toep(1) - no converge; random s3 c3/subject=fluseasn type=toep(1); random s4 c4/subject=fluseasn type=toep(1); random _residual_; covtestglm; output out=out2 pred(ilinkblup)=pblup pred(ilinknoblup)=overall pearson= p_resid; run;
output out=out2 pred(ilinkblup)=pblup pred(ilinknoblup)= overall pearson= p_resid; run; Pearson Residuals: Used to check fit when using default (pseudo-likelihood) Variance should be near 1 procmeansmeanvar; varp_resid; run; Without random _residual_; variance 3.63. With random _residual_; variance 0.83. --------------------------------------------------------------- Fit Statistics -2 Res Log Pseudo-Likelihood 341.46 Generalized Chi-Square 1707.29 Gener. Chi-Square / DF 4.59 ------------------------------------------------------------------------- Or… use method=quad
Type III Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F S1 1 8 34.93 0.0004 c1 1 8 25.49 0.0010 Tests of Covariance Parameters Based on the Residual Pseudo-Likelihood Label DF -2 Res Log P-LikeChiSq Pr > ChiSqNote Independence 6 1312.34 970.88 <.0001 MI MI: P-value based on a mixture of chi-squares Output due to covtestglm; random _residual_ does not affect the fit (just standard errors)
Could try 2 parameter Beta distribution instead: PROCGLIMMIX DATA=FLU; title2 "GLIMMIX Analysis"; class fluseasn; model f = s1 c1 /dist=beta link=logit s; random intercept/subject=fluseasn; random s1 c1/subject=fluseasn type=toep(1); random s2 c2/subject=fluseasn type=toep(1); random s3 c3/subject=fluseasn type=toep(1); random s4 c4/subject=fluseasn type=toep(1); output out=out3 pred(ilinkblup)=pblup pred(ilinknoblup)=overall pearson=p_residbeta; run;
Binomial Assumption Without BLUPS and with BLUPS
Beta Assumption Without BLUPS and with BLUPS
Poisson Example: Wave induced damage incidents in 40 ships (ship groups) Variables: Factorial Effects (fixed, classificatory): Ship Type 5 levels A,B,C,D,E Year Constructed 4 levels Years of Operation 2 levels Covariate (“offset” - continuous) = Time in service (“Aggregate months”) Incidents (dependent, counts) Source” McCullough & Nelder (but I ignore cases where year constructed > period of operation)
procglimmix data=ships; Title "Ignoring Ship Variance"; class operation construct shiptype; model incidents = operation construct shiptype/ dist=poisson s offset=log_service; run; ln(service) Poisson:ln(l) – ln(service)= b0 + b1(operation) + b2(construct) + b3(ship_type) ln(l/service) • >1 • Ship variance? -2 Log Likelihood 136.56 (more fit statistics) Pearson Chi-Square 42.28 Pearson Chi-Square / DF 1.69
Without ship variance component: Type III Tests of Fixed Effects NumDen Effect DF DF F Value Pr > F Operation 1 25 10.57 0.0033 Construct 3 25 9.72 0.0002 Shiptype4 25 6.50 0.0010
PROCGLIMMIX data=ships method=quad; class operation construct shiptype ship; model incidents = operation construct shiptype/ dist=poisson s offset=log_service; covtest "no ship effect" glm; * random ship; random intercept / subject = operation*construct*shiptype; run; Covariance Parameter Estimates Standard CovParm Subject Estimate Error Intercept Operat*Constr*Shipty0.
Fit Statistics -2 Log Likelihood 136.56 Fit Statistics for Conditional Distribution -2 log L(incidents | r. effects) 136.56 Pearson Chi-Square 42.27 Pearson Chi-Square / DF 1.24 Type III Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F Operation 1 25 10.57 0.0033 no Construct 3 25 9.72 0.0002 shiptype 4 25 6.50 0.0010 changes Tests of Covariance Parameters covtest "no ship effect" glm; Based on the Likelihood Label DF -2 Log Like ChiSqPr > ChiSqNote no ship effect 1 136.56 . 1.0000MI MI: P-value based on a mixture of chi-squares.
Horseshoe Crab study (reference: SAS GLIMMIX course notes): Female nests have “satellite” males Count data – Poisson?Generalized Linear Features (predictors): Carapace Width, Weight, Color, Spine condition Random Effect: SiteMixed Model To nest Go State
Fit Statistics Gener. Chi-Square / DF 2.77 CovParm Subject Estimate Intercept site 0.1625 Effect Estimate Pr > |t| Intercept -1.1019 0.2527 weight 0.5042 0.0035 width 0.0318 0.5229 procglimmix data=crab; class site; model satellites = weight width / dist=poi solution ddfm=kr; random int / subject=site; output out=overdisppearson=pearson; run; procmeans data=overdisp n mean var; varpearson; run; procunivariate data=crab normal plot; var satellites; run; Histogram # Boxplot 15.5+* 1 0 .* 1 0 . 12.5+* 1 | .* 1 | .** 3 | 9.5+** 3 | .*** 6 | .** 4 | 6.5+******* 13 | .******** 15 +-----+ .********** 19 | | 3.5+********** 19 | | .***** 9 *--+--* .******** 16 | | 0.5+******************************* 62 +-----+ ----+----+----+----+----+----+- * may represent up to 2 counts N Mean Variance --------------------------- 173 -0.0258264 2.6737114 --------------------------- Zero Inflated ?
Zero Inflated Poisson (ZIP) Q: Can zero inflation cause overdispersion (s2>m)? Recall: in Poisson, s2=m=l
Nice job Grandpa. That proof just about put everyone to sleep
Dickey ncsu Zero Inflated Poisson - (ZIP code ) SAS Global Forum Washington DC 20745 procnlmixed data=crab; parms b0=0bwidth=0bweight=0 c0=-2 c1=0s2u1=1 s2u2=1; x=c0+c1*width+u1; p0 = exp(x)/(1+exp(x)); * width affects p0; eta= b0+bwidth*width +bweight*weight +u2; lambda=exp(eta); if satellites=0 then loglike = log(p0 +(1-p0)*exp(-lambda)); else loglike = log(1-p0)+satellites*log(lambda)-lambda-lgamma(satellites+1); expected=(1-p0)*lambda; id p0 expected lambda; model satellites~general(loglike); Random U1 U2~N([0,0],[s2u1,0,s2u2]) subject=site; predict p0+(1-p0)*exp(-lambda) out=out1; run;
Parameter Estimates Parameter Estimate t Pr>|t| Lower Upper b0 2.7897 2.55 0.0268 0.3853 5.1942 bwidth -0.0944 -1.65 0.1267-0.2202 0.0314 bweight 0.4649 2.38 0.0366 0.0347 0.8952 c0 13.3739 4.42 0.0010 6.7078 20.0401 c1 -0.5447 -4.61 0.0008 -0.8049 -0.2844 s2u1 0.5114 1.12 0.2852 -0.4905 1.5133 s2u2 0.1054 1.67 0.1239 -0.0339 0.2447 weight affects l width affects p0 Variance for p0 l
From fixed part of model, compute Pr{count=j} and plot (3D) versus Weight, Carapace width
Your talk seems better now Grandpa!
Another possibility: Negative binomial Number of failures until kth success ( p=Prob{success} )
Negative binomial: In SAS, k (scale) is our 1/k procglimmix data=crab; class site; model satellites = weight width / dist=nb solution ddfm=kr; random int / subject=site; run; Fit Statistics -2 Res Log Pseudo-Likelihood 539.06 Generalized Chi-Square 174.83 Gener. Chi-Square / DF 1.03 Covariance Parameter Estimates CovParm Subject Estimate Std. Error Intercept site 0.09527 0.07979 Scale 0.7659 0.1349 Standard Effect Estimate Error DF t Value Pr > |t| Intercept -1.2022 1.6883 168.5 -0.71 0.4774 weight 0.6759 0.3239 156.6 2.09 0.0386 width 0.01905 0.08943 166.2 0.21 0.8316
Population average model vs. Individual Specific Model 8 typists Y=Error counts (Poisson distributed) ln(li)= ln(mean of Poisson) = m+Ui for typist i so li=em+Ui conditionally (individual specific) Distributions for Y, U~N(0,1) and m=1 l=em=e1=2.7183 = mean for “typical” typist (typist with U=0)
Population average model Expectation ||||| | | of individual distributions averaged across population of all typists. Run same simulation for 8000 typists, compute mean of conditional population means, exp(m+U). The MEANS Procedure Variable N Mean Std Dev Std Error ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ lambda 80004.4280478 6.0083831 0.067175 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Z=(4.428-2.7183)/0.06718 = 25.46 !! Population mean is notem Conditional means, m+U, are lognormal. Log(Y)~N(1,1) E{Y}=exp(m+0.5s2) = e1.5 = 4.4817
Main points: • Generalized linear models with random effects are subject specific models. • Subject specific models have fixed effects that represent an individual with random effects 0 (individual at the random effect distributional means). • Subject specific models when averaged over the subjects do not give the model fixed effects. • Models with only fixed effects do give the fixed effect part of the model when averaged over subjects and are thus called population average models.