Partially missing at random and ignorable inferences for parameter subsets with missing data

Partially missing at random and ignorable inferences for parameter subsets with missing data Roderick Little

Outline • Survey Bayesics in three slides • Inference with missing data: Rubin's (1976) paper on conditions for ignoring the missing-data mechanism • Rubin’s standard conditions are sufficient but not necessary: example • Propose definitions of MAR, ignorability for likelihood (and Bayes) inference for subsets of parameters • Examples • Joint work with Sahar Zanganeh Graybill Conference: Partially Missing at Random

Calibrated Bayes • Frequentists should be Bayesian • Bayes is optimal under assumed model • Bayesians should be frequentist • We never know the model (and all models are wrong) • Inferences should have good repeated sampling characteristics • Calibrated Bayes (e.g. Box 1980, Rubin 1984, Little 2012) • Inference based on a Bayesian model • Model chosen to yield inferences that are well-calibrated in a frequentist sense • Aim for posterior probability intervals that have (approximately) nominal frequentist coverage Graybill Conference: Partially Missing at Random

Calibrated Bayes models for surveys should incorporate sample design features • All models are wrong, some models are useful • Design-assisted: make the estimator more robust • Calibrated Bayes: make the model more robust – many models yield design-consistent estimates • Models that ignore features like survey weights are vulnerable to misspecification • But models can be successfully applied in survey setting, with attention to design features • Weighting, stratification, clustering • Capture design weights as covariates in the prediction model (e.g. Gelman 2007) Graybill Conference: Partially Missing at Random

Benefits of Bayes • Unified approach to all problems • Avoids current approach -- “inferential schizophrenia” • Not asymptotic • Propagates errors in estimating parameters • Avoids frequentist pitfalls: • Conditions on ancillaries • Obeys likelihood principle Graybill Conference: Partially Missing at Random

v Graybill Conference: Partially Missing at Random

There are those who predict… … and those who weight Graybill Conference: Partially Missing at Random

Rubin (1976 Biometrika) • Landmark paper (3700+ citations, after being rejected by many journals!) • RL wrote his first (11 page) referee report, and an obscure discussion • Modeled the missing data mechanism by treating missingness indicators as random variables, assigning them a distribution • Sufficient conditions under which missing data mechanism can be ignored for likelihood and frequentist inference about parameters • Focus here on likelihood, Bayes Graybill Conference: Partially Missing at Random

Ignoring the mechanism • Full likelihood: • Likelihood ignoring mechanism: • Missing data mechanism can be ignored for likelihood inference when Graybill Conference: Partially Missing at Random

Rubin’s sufficient conditions for ignoring the mechanism • Missing data mechanism can be ignored for likelihood inference when • (a) the missing data are missing at random (MAR): • (b) distinctness of the parameters of the data model and the missing-data mechanism: • MAR is the key condition: without (b), inferences are valid but not fully efficient Graybill Conference: Partially Missing at Random

“Sufficient for ignorable” is not the same as “ignorable” • These definitions have come to define ignorability (e.g. Little and Rubin 2002) • However, Rubin (1976) described (a) and (b) as the "weakest simple and general conditions under which it is always appropriate to ignore the process that causes missing data". • These conditions are not necessary for ignoring the mechanism in all situations. Graybill Conference: Partially Missing at Random

Or whole population N Example 1: Nonresponse with auxiliary data 0 0 0 1 1 ? ? ? ? Not linked Graybill Conference: Partially Missing at Random

MAR, ignorability for parameter subsets • MAR and ignorability are defined in terms of the complete set of parameters in the data model for D • It would be useful to have a definition of MAR that applies to subsets of parameters, including parameters of substantive interest. • A trivial example:It seems plausible that a nonignorable mechanism would be MAR for the parameters of distributions of variables that are not missing. Graybill Conference: Partially Missing at Random

MAR, ignorability for parameter subsets Graybill Conference: Partially Missing at Random

Partial MAR given a function of mechanism Graybill Conference: Partially Missing at Random

Example 1: Auxiliary Survey Data 0 0 0 1 1 ? ? ? ? Not linked Graybill Conference: Partially Missing at Random

Ex. 2: MNAR Monotone Bivariate Data • Paper presents more interesting case with Y1, Y2blocks of variables and missing data in each block 0 0 0 1 1 ? ? Graybill Conference: Partially Missing at Random

More generally… Graybill Conference: Partially Missing at Random

Ex. 3: Complete Case Analysis in Regression 0 0 0 0 1 1 ? ? ? ? Graybill Conference: Partially Missing at Random

Ex. 4:A normal pattern-mixture model 0 0 0 1 1 ? ? Graybill Conference: Partially Missing at Random

Ex. 5: Subsample ignorable likelihood Little and Zhang (2011) Columns could be vectors √ = fully observed ? = observed or missing • Interest concerns parameters of regression of Y on (Z,X,W) • Z complete, Wand (X,Y) incomplete. W complete in P1. • Division of covariates into W, X is based on following MNAR assumptions about the missing data mechanism: • Pr(W complete) = fn(W,X,Z) (not Y) (X,Y) MAR in subsample with W fully observed (that is, P1) Graybill Conference: Partially Missing at Random

Ex. 6: Auxiliary data, survey nonresponse 1 . . r . . n . . N ? ? ? ? Not linked Graybill Conference: Partially Missing at Random

Simulation Study Graybill Conference: Partially Missing at Random

Simulation Study: methods CC: Complete Case estimates based on the responding units M1: ML based on a logistic regression with interaction for Y3 M2: ML based on an additive logistic regression for Y3 NR: Weighting class estimates where nonresponse weights are obtained based on Y1 PS: Post-stratification weighted estimates (PS) based on Y2 NRPS: Adjust weights using both Y1 and Y2. For the case of categorical variable, this method is equivalent to Linear Calibration regression, or Generalized Raking estimates Graybill Conference: Partially Missing at Random

Graybill Conference: Partially Missing at Random

Simulation: summary findings • When response depends on Y1 *Y2 interaction, all methods do poorly • When data are MCAR, all methods do similarly well • Model-based methods remove almost all the bias and perform better when response doesn’t depend on Y1 *Y2 interaction • Qualitative patterns hold for different sample sizes Graybill Conference: Partially Missing at Random

Frequentist inference • Rubin’s (1976) sufficient conditions for ignorability for frequentist inference were even stronger (essentially MCAR) • These can be weakened too – for example asymptotic frequentist inference based on ML and observed information matrix works under conditions given here • Small sample inference seems more problematic Graybill Conference: Partially Missing at Random

Frequentist inference • Rubin’s (1976) sufficient conditions for ignorability for frequentist inference were even stronger (essentially MCAR) • These can be weakened too – for example asymptotic frequentist inference based on ML and observed information matrix works under conditions given here • Small sample inference is more complex Graybill Conference: Partially Missing at Random

Summary • Proposed definitions of partial MAR, ignorability for subsets of parameters • Expands range of situations where missing data mechanism can be ignored • Though, in some cases, MAR analysis entails a loss of information – • How much is lost is an interesting question, varies by context Graybill Conference: Partially Missing at Random

References Harel, O. and Schafer, J.L. (2009). Partial and Latent Ignorability in missing data problems. Biometrika, 2009, 1-14 Little, R.J.A. (1993). Pattern‑Mixture Models for Multivariate Incomplete Data. JASA, 88, 125-134. Little, R. J. A., and Rubin, D. B. (2002). Statistical Analysis with Missing Data (2nd ed.) Wiley. Little, R.J. and Zangeneh, S.Z. (2013). Missing at random and ignorability for inferences about subsets of parameters with missing data. University of Michigan Biostatistics Working Paper Series. Little, R. J. and Zhang, N. (2011). Subsample ignorable likelihood for regression analysis with missing data. JRSSC, 60, 4, 591–605. Rubin, D. B. (1976). Inference and Missing Data. Biometrika 63, 581-592. Graybill Conference: Partially Missing at Random

Partially missing at random and ignorable inferences for parameter subsets with missing data

Partially missing at random and ignorable inferences for parameter subsets with missing data

Presentation Transcript

Handling Missing Data

MISSING DATA

General Methods for Missing Data

Handling Missing Data

Analyzing Missing Data

Coping with Missing Data for Active Learning

Parameter Redundancy in Mark-Recapture and Ring-Recovery Models with Missing Data

Managing missing data

Handling Missing Data

Learning with Missing Data

Missing Data

Unsupervised Learning With Non-ignorable Missing Data

Data Processing with Missing Information

Missing Data

Missing Data

Handling Missing Data

Missing Data

Random Subspace Feature Selection for Analysis of Data with Missing Features

Handling Missing Data

Missing Data Mechanisms

Partially Missing At Random and Ignorable Inferences for Parameter Subsets with Missing Data