430 likes | 627 Views
Multilevel Models for Binary and Ordered Response Data. Generalised multilevel models. So Far Response at level 1 has been a continuous variable and associated level 1 random term has been assumed to have a Normal distribution Now a range of other data types for the response
E N D
Generalisedmultilevel models • So Far • Response at level 1 has been a continuous variable and • associated level 1 random term has been assumed to have • a Normal distribution • Now a range of other data types for the response • All can be handled routinely by MLwiN • Achieved by 2 aspects • a non-linear link between response and predictors • a non-Gaussian level 1 distribution
Modelling Proportions E.g. yij is proportion using contraception in area i of region j Denominator nij could be number of women of reproductive age in area i of region j After defining nij we assume yij has a binomial distribution with mean
Focus on modelling proportions • Proportions eg death rate; employment rate; obesity rates: can be conceived as the underlying probability of dying; probability of being employed; probability of being obese • Four important attributes of a proportion that MUST be taken into account in modelling • (1) Closed range: bounded between 0 and 1 • (2) Anticipated non-linearity between response and predictors; as predicted response approaches bounds, greater and greater change in x is required to achieve the same change in outcome; examination analogy • (3) Consists of two numbers: numerator which is subset of denominator • (4) Heterogeneity: variance is not homoscedastic; two aspects • (a) the variance depends on the mean; • as approach bound of 0 and 1, less room to vary • ie Variance is a function of the predicted probability • (b) the variance depends on the denominator; • small denominators result in highly variable proportions
Modelling Proportions • Linear probability model: that is use standard regression model with linear relationship and Gaussian random term • But 3 problems • (1) Nonsensical predictions: predicted proportions are unbounded, outside range of 0 and 1 • (2) Anticipated non-linearity as approach bounds • (3) Heterogeneity: inherent unequal variance • dependent on mean and on denominator • Logit model with Binomial random term resolves all three problems (could use probit, clog-clog)
The logistic model: resolves problems 1 & 2 • The relationship between the probability and predictor(s) can be represented by a logistic function, that resembles a S-shaped curve • Models not the proportion but a non-linear transformation of it (solves problems 1+2)
The Logit transformation • L = LOGe(p/ (1-p)) • L = Logit = the log of the odds • p = proportion having an attribute • 1-p = proportion not having the attribute • p/(1-p) = the odds of having an attribute compared to not having an attribute • As p goes from 0 to 1, L goes from minus to plus infinity, so if model L, cannot get predicted proportions that lie outside 0 and 1; (ie solves problem 1) • Easy to move between proportions, odds and logits
The logistic model • The underlying probability or proportion is non-linearly related to the predictor • where e is the base of the natural logarithm • linearized by the logit transformation(log = natural logarithm)
The logistic model: key characteristics • The logit transformation produces a linear function of the parameters. • Bounded between 0 and 1 • Thereby solving problems 1 and 2
Solving problem 3: Assume Binomial variation • Variance of the response in logistic models is presumed to be binomial: • I.E. depends on underlying proportion and the denominator • In practice this is achieved by replacing the constant variable at level 1 by a binomial weight, z, and constraining the level-1 variance to 1 for exact binomial variation • The random (level-1) component can be written as
Multilevel Logistic Model • Assume observed response comes from a Binomial distribution with a denominator for each cell, and an underlying probability/proportion • Underlying proportions/probabilities, in turn, are related to a set of individual and neighborhood predictors by the logit link function • Linear predictor of the fixed part and the higher-level random part
Obtaining PA Effects from a CS Model: Predicted Probabilities
Predicted Probabilities by Simulation (now implemented in MLwiN)
Estimation in MLwiN • Quasi-likelihood (MQL/PQL – 1st and 2nd order) are approximate procedures, good for model screening • MQL1 crudest approximation. Estimates may be biased downwards (esp. if clusters small). But stable. • PQL2 best approximation, but may not converge. • Tip: Start with MQL1 to get starting values for PQL. • MCMC better (check results for final model)
Ordered Response Models • When we have several categories and there is an underlying ordering to the categories then a convenient parameterisation is to work with cumulative probabilities i.e. the probabilities that an inidividual crosses each threshold between categories. For example with obesity ranges: With an ordered multinomial model we work with the set of cumulative probabilities, γki. As we see that γ5i = 1 we need only model four categories in the model. This modelling will collapse to the binary logistic modelling we have already considered when there are only 2 categories.
A model with no explanatory variables • We will again use the logit link so we have: The threshold probabilities γki.are given by anti-logit(βki) Because γ1i ≤ γ2i ≤ γ3i ≤ γ4i it follows that β0i ≤ β1i ≤ β2i ≤ β3i
Adding covariates to the model Here predictors are added in common across the response threshold categories. This is referred to as proportional odds modelling. This means that the log odds ratios and odds ratios for threshold category membership are independent of the predictor variables.
Multilevel Modelling etc. • Ordered multinomial models can be extended to multilevel settings by adding in random effects to the equation for hias in the binary case. • The proportional odds assumption can be tested and relaxed. • These features need a more advanced workshop but see material at the multilevel modelling centre website for details.