70 likes | 283 Views
Latent normal models for missing data. Harvey Goldstein Centre for Multilevel Modelling University of Bristol. The (multilevel) binary probit model. Suppose that we have a variance components 2-level model for an underlying continuous variable written as
E N D
Latent normal models for missing data Harvey Goldstein Centre for Multilevel Modelling University of Bristol
The (multilevel) binary probit model Suppose that we have a variance components 2-level model for an underlying continuous variable written as and suppose a positive value is observed for a variable Z when . We then have which is in fact just the probit link function.
Ordered and unordered latent normal models • Ordered categories, 1…p: Define Where we additionally need to estimate thresholds • For unordered categories we can map to an underlying (p-1) variate normal distribution with covariance matrix
Multivariate - mixed type - responses • Consider a bivariate (multilevel) response with a normal and binary response. • We can map onto a latent bivariate normal. • We can use ML for parameter estimation or MCMC • MCMC provides a chain for random draws from the latent normal distribution for the binary response. Each draw is conditioned on the observed (correlated) normal response. • Where a response is (randomly) missing we draw an impute from its (estimated) conditional distribution – this is easy for a MVN – and this is done for every MCMC iteration. For the binary response this can be mapped back onto (0,1). • Typically use uninformative priors.
Multiple imputations • Every n-th iteration (say n~500 to ensure zero autocorrelation) we can choose a ‘completed’ bivariate dataset. This will then yield p imputed datasets to combine using ‘Rubin’ rules. • We can extend to include ordered, unordered, Poisson etc. responses all mapped onto a latent MVN with missing data mapped back to original scales. • We can also include responses defined at higher levels or classifications – correlated with higher level random effects for lower level responses.
MI for multilevel GLMs • Every variable is treated as a response, possibly with fully observed variables as covariates in MVN model. For multilevel models this may include variables measured at higher levels. • Imputation carried out for the MVN model, mapped back to original scales, MOI fitted to multiple datasets and combined. • Note that for non-normal continuous variables we may be able to use e.g. a Box-Cox transform within the same model framework. • Note that for general discrete distributions we may be able to approximate by a set of ordered categories.
A simulation ~ 30% records with randomly missing dataResponse is 16-year-old exam score See Goldstein, Carpenter, Kenward and Levin. (2009). Statist. Modelling, 9, 173-197