1 / 7

Latent normal models for missing data

Latent normal models for missing data. Harvey Goldstein Centre for Multilevel Modelling University of Bristol. The (multilevel) binary probit model. Suppose that we have a variance components 2-level model for an underlying continuous variable written as

camdyn
Download Presentation

Latent normal models for missing data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Latent normal models for missing data Harvey Goldstein Centre for Multilevel Modelling University of Bristol

  2. The (multilevel) binary probit model Suppose that we have a variance components 2-level model for an underlying continuous variable written as and suppose a positive value is observed for a variable Z when . We then have which is in fact just the probit link function.

  3. Ordered and unordered latent normal models • Ordered categories, 1…p: Define Where we additionally need to estimate thresholds • For unordered categories we can map to an underlying (p-1) variate normal distribution with covariance matrix

  4. Multivariate - mixed type - responses • Consider a bivariate (multilevel) response with a normal and binary response. • We can map onto a latent bivariate normal. • We can use ML for parameter estimation or MCMC • MCMC provides a chain for random draws from the latent normal distribution for the binary response. Each draw is conditioned on the observed (correlated) normal response. • Where a response is (randomly) missing we draw an impute from its (estimated) conditional distribution – this is easy for a MVN – and this is done for every MCMC iteration. For the binary response this can be mapped back onto (0,1). • Typically use uninformative priors.

  5. Multiple imputations • Every n-th iteration (say n~500 to ensure zero autocorrelation) we can choose a ‘completed’ bivariate dataset. This will then yield p imputed datasets to combine using ‘Rubin’ rules. • We can extend to include ordered, unordered, Poisson etc. responses all mapped onto a latent MVN with missing data mapped back to original scales. • We can also include responses defined at higher levels or classifications – correlated with higher level random effects for lower level responses.

  6. MI for multilevel GLMs • Every variable is treated as a response, possibly with fully observed variables as covariates in MVN model. For multilevel models this may include variables measured at higher levels. • Imputation carried out for the MVN model, mapped back to original scales, MOI fitted to multiple datasets and combined. • Note that for non-normal continuous variables we may be able to use e.g. a Box-Cox transform within the same model framework. • Note that for general discrete distributions we may be able to approximate by a set of ordered categories.

  7. A simulation ~ 30% records with randomly missing dataResponse is 16-year-old exam score See Goldstein, Carpenter, Kenward and Levin. (2009). Statist. Modelling, 9, 173-197

More Related