330 likes | 493 Views
Conditional Test Statistics. Suppose that we are considering two Log-linear models and that Model 2 is a special case of Model 1. That is the parameters of Model 2 are a subset of the parameters of Model 1. Also assume that Model 1 has been shown to adequately fit the data.
E N D
Suppose that we are considering two Log-linear models and that Model 2 is a special case of Model 1. • That is the parameters of Model 2 are a subset of the parameters of Model 1. • Also assume that Model 1 has been shown to adequately fit the data.
In this case one is interested in testing if the differences in the expected frequencies between Model 1 and Model 2 is simply due to random variation] The likelihood ratio chi-square statistic that achieves this goal is:
Goodness of Fit test for the all k-factor models Conditional tests for zero k-factor interactions
Conclusions • The four factor interaction is not significant G2(3|4) = 0.7 (p = 0.705) • The all three factor model provides a significant fit G2(3) = 0.7 (p = 0.705) • All the three factor interactions are not significantly different from 0, G2(2|3) = 9.2 (p = 0.239). • The all two factor model provides a significant fit G2(2) = 9.9 (p = 0.359) • There are significant 2 factor interactions G2(1|2) = 33.0 (p = 0.00083. Conclude that the model should contain main effects and some two-factor interactions
There also may be a natural sequence of progressively complicated models that one might want to identify. In the laundry detergent example the variables are: • Softness of Laundry Used • Previous use of Brand M • Temperature of laundry water used • Preference of brand X over brand M
The all-Main effects model Independence amongst all four variables Since previous use of brand M may be highly related to preference for brand M, add first the 2-4 interaction Brand M is recommended for hot water add 2nd the 3-4 interaction brand M is also recommended for Soft laundry add 3rd the 1-3 interaction Add finally some possible 3-factor interactions • [1][2][3][4] • [1][3][24] • [1][34][24] • [13][34][24] • [13][234] • [134][234] A natural order for increasingly complex models which should be considered might be:
Stepwise selection procedures Forward Selection Backward Elimination
Forward Selection: Starting with a model that under fits the data, log-linear parameters that are not in the model are added step by step until a model that does fit is achieved. At each step the log-linear parameter that is most significant is added to the model: To determine the significance of a parameter added we use the statistic: G2(2|1) = G2(2) – G2(1) Model 1 contains the parameter. Model 2 does not contain the parameter
Backward Selection: Starting with a model that over fits the data, log-linear parameters that are in the model are deleted step by step until a model that continues to fit the model and has the smallest number of significant parameters is achieved. At each step the log-linear parameter that is least significant is deleted from the model: To determine the significance of a parameter deleted we use the statistic: G2(2|1) = G2(2) – G2(1) Model 1 contains the parameter. Model 2 does not contain the parameter
K = knowledge N = Newspaper R = Radio S = Reading L = Lectures
The best model was found a the previous step • [LN][KLS][KR][KN][LR][NR][NS]
Logit Models To date we have not worried whether any of the variables were dependent of independent variables. The logit model is used when we have a single binary dependent variable.
The variables • Type of seedling (T) • Longleaf seedling • Slash seedling • Depth of planting (D) • Too low. • Too high • Mortality (M) (the dependent variable) • Dead • Alive
The Log-linear Model Note: mij1 = # dead when T = i and D = j. mij2 = # alive when T = i and D = j. = mortality ratio when T = i and D = j.
Hence since
The logit model: where
Thus corresponding to a loglinear model there is logit model predicting log ratio of expected frequencies of the two categories of the independent variable. Also k +1 factor interactions with the dependent variable in the loglinear model determine k factor interactions in the logit model k + 1 = 1 constant term in logit model k + 1 = 2, main effects in logit model
The best model was found by forward selection was [LN][KLS][KR][KN][LR][NR][NS] To fit a logit model to predict K (Knowledge) we need to fit a loglinear model with important interactions with K (knowledge), namely [LNRS][KLS][KR][KN] The logit model will contain Main effects for L (Lectures), N (Newspapers), R (Radio), and S (Reading) Two factor interaction effect for L and S
The Logit Parameters for the Model : LNSR, KLS, KR, KN ( Multiplicative effects are given in brackets, Logit Parameters = 2 Loglinear parameters) The Constant term: -0.226 (0.798) The Main effects on Knowledge: Lectures Lect 0.268 (1.307) None -0.268 (0.765) Newspaper News 0.324 (1.383) None -0.324 (0.723) Reading Solid 0.340 (1.405) Not -0.340 (0.712) Radio Radio 0.150 (1.162) None -0.150 (0.861) The Two-factor interaction Effect of Reading and Lectures on Knowledge