NOMINAL RESPONSES: BASELINE-CATEGORY LOGIT MODELS (Agresti 7.1)

NOMINAL RESPONSES:BASELINE-CATEGORY LOGIT MODELS (Agresti 7.1) Kathy Fung and Lin Zhang Statistics 6841 Project Winter 2005

Objective Introduction of NOMINAL RESPONSES (BASELINE-CATEGORY LOGIT MODELS) The Concept and Example

Model Definition

Some Notes: • With categorical predictors, X2 and G2 goodness-of-fit statistics provide a model check when data are not sparse. • When an explanatory variable is continuous or the data are sparse such statistics are still valid for comparing nested models differing by relative few terms.

Alligator Food Choice Example

SAS code of Table 7.1 *SAS for Baseline-Category Logit Models with Alligator Data in Table 7.1; data gator; infile 'K:\CSU Hayward\Stat 6841\project\gator.txt'; input lake gender size food count ; proclogistic; freq count; class lake size / param=ref; model food(ref='1') = lake size / link=glogit aggregate scale=none; proccatmod; weight count; population lake size gender; model food = lake size / pred=freq pred=prob; run;

Output The LOGISTIC Procedure Model Information Data Set WORK.GATOR Response Variable food Number of Response Levels 5 Frequency Variable count Model generalized logit Optimization Technique Fisher's scoring Number of Observations Read 80 Number of Observations Used 56 Sum of Frequencies Read 219 Sum of Frequencies Used 219 Response Profile Ordered Total Value food Frequency 1 1 94 2 2 61 3 3 19 4 4 13 5 5 32 Logits modeled use food=1 as the reference category. NOTE: 24 observations having nonpositive frequencies or weights were excluded since they do not contribute to the analysis.

Output Class Level Information Class Value Design Variables lake 1 1 0 0 2 0 1 0 3 0 0 1 4 0 0 0 size 1 1 2 0 Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied.

Output Deviance and Pearson Goodness-of-Fit Statistics Criterion Value DF Value/DF Pr > ChiSq Deviance 17.0798 12 1.4233 0.1466 Pearson 15.0429 12 1.2536 0.2391 Number of unique profiles: 8 Model Fit Statistics Intercept Intercept and Criterion Only covariates AIC 612.363 580.080 SC 625.919 647.862 -2 Log L 604.363 540.080 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 64.2826 16 <.0001 Score 57.2475 16 <.0001 Wald 49.7584 16 <.0001

Output Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq lake 12 35.4890 0.0004 size 4 18.7593 0.0009 Analysis of Maximum Likelihood Estimates Standard Wald Parameter food DF Estimate Error Chi-Square Pr > ChiSq Intercept 2 1 -1.5490 0.4249 13.2890 0.0003 Intercept 3 1 -3.3139 1.0528 9.9081 0.0016 Intercept 4 1 -2.0931 0.6622 9.9894 0.0016 Intercept 5 1 -1.9043 0.5258 13.1150 0.0003 lake 1 2 1 -1.6583 0.6129 7.3216 0.0068 lake 1 3 1 1.2422 1.1852 1.0985 0.2946 lake 1 4 1 0.6951 0.7813 0.7916 0.3736

Output Analysis of Maximum Likelihood Estimates Standard Wald Parameter food DF Estimate Error Chi-Square Pr > ChiSq lake 1 5 1 0.8262 0.5575 2.1959 0.1384 lake 2 2 1 0.9372 0.4719 3.9443 0.0470 lake 2 3 1 2.4583 1.1179 4.8360 0.0279 lake 2 4 1 -0.6532 1.2021 0.2953 0.5869 lake 2 5 1 0.00565 0.7766 0.0001 0.9942 lake 3 2 1 1.1220 0.4905 5.2321 0.0222 lake 3 3 1 2.9347 1.1161 6.9131 0.0086 lake 3 4 1 1.0878 0.8417 1.6703 0.1962 lake 3 5 1 1.5164 0.6214 5.9541 0.0147 size 1 2 1 1.4582 0.3959 13.5634 0.0002 size 1 3 1 -0.3513 0.5800 0.3668 0.5448 size 1 4 1 -0.6307 0.6425 0.9635 0.3263 size 1 5 1 0.3316 0.4483 0.5471 0.4595

Output Odds Ratio Estimates Point 95% Wald Effect food Estimate Confidence Limits lake 1 vs 4 2 0.190 0.057 0.633 lake 1 vs 4 3 3.463 0.339 35.343 lake 1 vs 4 4 2.004 0.433 9.266 lake 1 vs 4 5 2.285 0.766 6.814 lake 2 vs 4 2 2.553 1.012 6.437 lake 2 vs 4 3 11.685 1.306 104.508 lake 2 vs 4 4 0.520 0.049 5.490 lake 2 vs 4 5 1.006 0.219 4.608 lake 3 vs 4 2 3.071 1.174 8.032 lake 3 vs 4 3 18.815 2.111 167.717 lake 3 vs 4 4 2.968 0.570 15.447 lake 3 vs 4 5 4.556 1.348 15.400 size 1 vs 2 2 4.298 1.978 9.339 size 1 vs 2 3 0.704 0.226 2.194 size 1 vs 2 4 0.532 0.151 1.875 size 1 vs 2 5 1.393 0.579 3.354

Output The CATMOD Procedure Data Summary Response food Response Levels 5 Weight Variable count Populations 16 Data Set GATOR Total Frequency 219 Frequency Missing 0 Observations 56 Population Profiles Sample lake size gender Sample Size ----------------------------------------------- 1 1 1 1 13 2 1 1 2 26 3 1 2 1 7 4 1 2 2 9 5 2 1 1 5 6 2 1 2 15 7 2 2 1 26 8 2 2 2 2 9 3 1 1 12 10 3 1 2 12 11 3 2 1 28 12 3 2 2 1 13 4 1 1 27 14 4 1 2 14 15 4 2 1 12 16 4 2 2 10

Output Response Profiles Response food ---------------- 1 1 2 2 3 3 4 4 5 5 Maximum Likelihood Analysis Maximum likelihood computations converged. Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq -------------------------------------------------- Intercept 4 70.39 <.0001 lake 12 35.49 0.0004 size 4 18.76 0.0009 Likelihood Ratio 44 52.48 0.1784

Output Analysis of Maximum Likelihood Estimates Function Standard Chi- Parameter Number Estimate Error Square Pr > ChiSq ---------------------------------------------------------------------------- Intercept 1 1.1514 0.2343 24.14 <.0001 2 0.4317 0.2737 2.49 0.1147 3 -0.6795 0.3818 3.17 0.0751 4 -0.9745 0.4049 5.79 0.0161 lake 1 1 -0.2391 0.3458 0.48 0.4892 1 2 -1.9977 0.4946 16.31 <.0001 1 3 -0.6556 0.6071 1.17 0.2802 1 4 0.1736 0.5654 0.09 0.7589 2 1 0.5814 0.5061 1.32 0.2506 2 2 1.4184 0.5250 7.30 0.0069 2 3 1.3810 0.6279 4.84 0.0278 2 4 -0.3542 0.9153 0.15 0.6988 3 1 -0.9293 0.3836 5.87 0.0154 3 2 0.0925 0.3910 0.06 0.8131 3 3 0.3467 0.5130 0.46 0.4991 3 4 -0.1240 0.5830 0.05 0.8316 size 1 1 -0.1658 0.2241 0.55 0.4595 1 2 0.5633 0.2525 4.98 0.0257 1 3 -0.3414 0.3257 1.10 0.2945 1 4 -0.4811 0.3564 1.82 0.1770

Table 7.2

Some Test Results for Table 7.2 • The data are sparse, 219 observations scattered among 80 cells. Thus, G2 is more reliable for comparing models than for testing fit. • The statistics • G2 [( )|(G)] = 2.1 and • G2=[(L + S)|(G + L + S)] = 2.2, each based on df = 4, suggest simplifying by collapsing the table over gender. (Other analyses, not presented here, show that adding interaction terms including G do not improve the fit significantly.) • The G2 and X2 values for the collapsed table indicate that both L and S have effects.

Table 7.3

Table 7.4

Prediction Equation for Log Odds of Selecting Invertebrates Instead of Fish • where s=1 for size 2.3 meters and 0 otherwise, • zH is a dummy variable for Lake Hancock (zH=1 for alligators in that lake and 0 otherwise), and • zO and zT are dummy variables for lakes Oklawaha and Trafford. • Size of alligators has a noticeable effect. For a given lake, for small alligators the estimated odds that primary food choice was invertebrates instead of fish are exp(1.46) = 4.3 times the estimated odds for large alligators; • the Wald 95% confidence interval is exp[1.46 ± 1.96(0.396)] = (2.0,9.3). • The lake effects indicate that the estimated odds that the primary food choice was invertebrates instead of fish are relatively higher at Lakes Trafford and Oklawaha and relatively lower at Lake Hancock than they are at Lake George.

Further Estimate Calculation

Estimating Response Probabilities(Model) The equation that expresses multinomial logit models directly in terms of response probabilities is

Estimating Response Probabilities(Results) • From Table 7.4 the estimated probability that a large alligator in Lake Hancock has invertebrates as the primary food choice is • The estimated probabilities for reptile, bird, other, and fish are 0.072, 0.141, 0.194, and 0.570.

Quality vs. Quantity

Summary and Conclusion

NOMINAL RESPONSES: BASELINE-CATEGORY LOGIT MODELS (Agresti 7.1)