450 likes | 600 Views
9.4 Modeling ordinal associations. The loglinear models presented so far have a serious limitation - they treat all classifications as nominal. If the order of a variable’s categories changes in any way, the fit is the same.
E N D
9.4 Modeling ordinal associations • The loglinear models presented so far have a serious limitation - they treat all classifications as nominal. • If the order of a variable’s categories changes in any way, the fit is the same. • For ordinal classifications, these models ignore important information.
Example • Subjects were asked their opinion about a man and woman having sexual relations before marriage (always wrong, almost always wrong, wrong only sometimes, not wrong at all). They were also asked whether methods of birth control should be available to teenagers between the ages of 14 and 16 (strongly disagree, disagree, agree, strongly agree).
Independence model data table9_3; input premar x1-x4; birth=1; count=x1; output; birth=2; count=x2; output; birth=3; count=x3; output; birth=4; count=x4; output; drop x1-x4; cards; 1 81 68 60 38 2 24 26 29 14 3 18 41 74 42 4 36 57 161 157 ; proc genmod data=table9_3; class premar birth; model count = premar birth/ dist = poi link = log lrci type3 obstats; ods output obstats=obstats Modelfit=Modelfit; run;
indicates lack of fit in the form of a positive trend: Subjects who are more willing to make birth control available to teenagers also tend to feel more tolerant about premarital sex.
9.4.1 Linear-by-Linear Association in Two-Way Tables It requires only one parameter to describe association, whereas the saturated model requires (I-1)*(J-1).
The direction and strength of the association depend on . • When >0, Y tends to increase as X increases. Expected frequencies are larger than expected (under independence) in cells where X and Y are both high or both low. • When <0, Y tends to decrease as X increases. When the data display a positive or negative trend, the LXL model usually fits much better than the independence model.
OR of rows a and c with columnsb and d • It depends on score u and v • When the local odds ratios for adjacent rows and adjacent columns have common value exp()
9.4.3 Likelihood Equations and Model Fitting • The Poisson log-likelihood • Differentiating and let them = 0 yields likelihood equ. • The parameters can be solved by Newton-Raphson iteration
Let • Then the third likelihood equationimply correlation between the scores for X and Y is the same for both distributions. The fitted counts display the same positive or negative trend as the data. • DF = IJ-[1+(I-1)+(J-1)+1]=IJ-I-J
9.4.4 Sex Opinions Example • using scores {1, 2, 3, 4} for rows and columns data table9_3; set table9_3; linlin=birth*premar; run; proc genmod data=table9_3; class premar birth; model count = premar birth linlin/ dist = poi link = log lrci type3 obstats; estimate'OR for four corner cells' linlin 9; /*(4-1)*(4-1)*/ run;
OR • The ML estimate indicates that subjects having more favorable attitudes about teen birth control also tend to have more tolerant attitudes about premarital sex. • The estimated local odds ratio is A 95% Wald confidence interval is • The strength of association seems weak. • However, nonlocal odds ratios are stronger. The estimated odds ratio for the four corner cells equals • This also results from the corner fitted values,
scores • Any other sets of equally spaced scores yield the same fit but an appropriately rescaled beta • Scores such as {1, 2, 4, 5} for rows and columns recognize this. The LXL model then has G2=8.8 (df=8) and beta=0.146 (SE=0.014). • One need not regard the scores as approximations for distances between categories or as reasonable scalings of ordinal variables in order for the models to be valid. • Equally spaced row and columns give uniform local odds ratio
9.4.5 Directed Ordinal Test of Independence • For the linear-by-linear association model, H0: independence is H0: =0. • The likelihood-ratio test statistic equals • For this example • p<0.0001
9.5 ASSOCIATION MODELS • Generalizations of the linear-by-linear association model apply to • multiway tables or • treat scores as parameters rather than fixed. • The models are called association models, because they focus on the association structure.
9.5.1 Row and Column Effects Models • X as nominal and Y as ordinal • Constrains • are called row effects • It has I-1 more parameters than independence model
9.5.2 Logit Model for Adjacent Responses • With , the row effects model has adjacent-categories logit form • The effect in row i is identical for each pair of adjacent responses. It is also called as parallel odds model. • Differences among {µi} compare rows with respect to their conditional distributions on Y. • When µi=µh, rows h and i have identical conditional distributions. • If µi>µh, Y is stochastically higher in row i than row h.
likelihood equations • It can be fitted by iteration. • Let • Then • For the conditional distribution within each row, the mean column score is the same for the fitted and sample distributions.
9.5.3 Political Ideology Example • relationship between political ideology and political party affiliation for a sample of voters in a presidential primary in Wisconsin.
SAS data table9_5; length party $ 11; input Party x1-x3; Ideology='Liberal '; I=1; count=x1; output; Ideology='Moderate'; I=2; count=x2; output; Ideology='conservative'; I=3; count=x3; output; cards; Democrat 143 156 100 Independent 119 210 141 Republican 15 72 127 ;
Fit in SAS /*independence model*/ procgenmoddata=table9_5; class Party Ideology; model count = Party Ideology/ dist = poi link = log lrcitype3obstats; odsoutput obstats=obstats1 Modelfit=Modelfit1; run; /*row effect model*/ procgenmoddata=table9_5; class Party Ideology; model count = Party Ideology Party*I/ dist = poi link = log lrcitype3obstats; odsoutput obstats=obstats2 Modelfit=Modelfit2; run;
In this sample the Republicans are much more conservative than the other two groups, and the Democrats row 1. are the most liberal. • the estimated odds that Republicans were conservative instead of moderate, or moderate instead of liberal, were exp(1.213)=3.36 times the corresponding estimated odds for Democrats.
9.5.4 Ordinal Variables in Models for Multiway Tables • Multidimensional tables with ordinal responses can use generalizations of association models. • In three dimensions, the rich collection of models includes 1. association models that are more parsimonious than the nominal model (XY, XZ, YZ) – replace association λ terms by structured terms that account for ordinality. 2. models permitting heterogeneous association that, unlike model (XYZ), are unsaturated.
If X and Y are both ordinal, the alternative to are • linear-by-linear term • A row effects term • A column effects term Those provide a stochastic ordering of conditional distributions within rows and within columns, or just within rows, or just within columns. • With the linear-by-linear term • The conditional local odds ratio • The association is the same in different partial tables, with homogeneous linear-by-linear XY association.
When the association is heterogeneous, structured terms for ordinal variables make effects simpler to interpret than in the saturated model. • For instance, the heterogeneous linear-by-linear XY association modelallows the XY association to change across levels of Z. • With unit-spaced scores, • It has uniform association within each level of Z, butheterogeneityamong levels of Z in the strength of association.
9.5.5 Air Pollution and Breathing Examples • To study the associations among smoking status-S, breathing test results - B, and age – A, for workers in certain industrial plants in Houston, Texas.
SAS code data table9_7; input A $ S x1-x3; B=1; count=x1; linlin=B*S; output; B=2; count=x2; linlin=B*S; output; B=3; count=x3; linlin=B*S; output; drop x1-x3; /*equally spaced scores - 1,2,3*/ cards; <40 1 577 27 7 <40 2 192 20 3 <40 3 682 46 11 40-59 1 164 4 0 40-59 2 145 15 7 40-59 3 245 47 27 ;
/*homogeneous association model*/ procgenmoddata=table9_7; class S B A; model count = A B S A*B A*S B*S/ dist = poi link = log lrcitype3obstats; odsoutput obstats=obstats1 Modelfit=Modelfit1; run; /*homogeneous linear-by-linear association model*/ procgenmoddata=table9_7; class S B A; model count = A B S A*B A*S linlin/ dist = poi link = log lrcitype3obstats; odsoutput obstats=obstats2 Modelfit=Modelfit2; run; /*heterogeneous linear-by-linear SB association model*/ procgenmoddata=table9_7; class S B A; model count = A B S A*B A*S A*linlin/ dist = poi link = log lrcitype3obstats; odsoutput obstats=obstats3 Modelfit=Modelfit3; run;
<4040-59 • The effect of smoking seems much stronger for the older group
Another example • Response: breathlessness, wheeze; explanatory: age
/*homogeneous association model*/ procgenmoddata=table9_8; class A B W; model count = A B W B*W A*B A*W/ dist = poi link = log type3obstats; odsoutput obstats=obstats1 Modelfit=Modelfit1; run; PROCTRANSPOSEDATA=obstats1 OUT=obstats1a; by A; var Streschi; run; PROC PRINT DATA=obstats1a; run;
suggest the model data table9_8a; set table9_8; if B='Yes' and W='Yes'then ABW=A; else ABW=0; run; procgenmoddata=table9_8a; class A B W; model count = A B W B*W A*B A*W ABW/ dist = poi link = log type3obstats; odsoutput obstats=obstats2 Modelfit=Modelfit2; run; PROCTRANSPOSEDATA=obstats2 OUT=obstats2a; by A; var Streschi; run; procprintdata=obstats2a; run;
The estimated BW log odds ratio at level k of age is 3.6762-0.131k, decreasing from 3.55 to 2.50.
9.6 ASSOCIATION MODELS, CORRELATION MODELS, ANDCORRESPONDENCE ANALYSIS* • 9.6.1 Multiplicative Row and Column Effects Model • Parameters: 1+I-1+J-1+1+(I-2)+(J-2)=2I+2J-4Residual DF=IJ-(2I+2J-4)=(I-2)*(J-2)location and scale constraints on