150 likes | 259 Views
Partial (donor) imputation with adjustments. Jeroen Pannekoek and Li-Chun Zhang. Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011. Contents. The problem of inconsistent micro-data Simple solutions and there limitations More general approaches. Example.
E N D
Partial (donor) imputation with adjustments Jeroen Pannekoek and Li-Chun Zhang Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011
Contents • The problem of inconsistent micro-data • Simple solutions and there limitations • More general approaches
Simple solutions (for response pattern I) • Prorating Edit 1: Turnover = Profit + Total Costs 950≠ 330 + 700 multiply imputations by 950 /(330+700)=0.92 Edit 2: Total costs = Wages + Other costs 0.92*700≠500 + 200 multiply r.h.s. by 0.92 • Ratio adjustment (ratio imputation) with R = Turnover main (donor) / Turnover main (observed). In this case the same results as for prorating except that Employees, that doesn't appear in any edit rule is also adjusted.
Problems with single constraint adjustments Consider response pattern II Edit violations E1: Turnover ≠Profit+ Total costs E2: Total costs≠Wages + Other costs Option: 1. Adjust Profit and Total costs to fit E1. 2. For the resulting value of Total costs adjust Other costs to fit E2. Problems: • Order does matter, different solution if we do it the other way around • Information on Wages is not used in adjusting Total costs • Infeasible solutions for adjusted Total costs do occur (adjusted Total costs < Wages)
Edit constraints as a system of equations For the vector of values x the constraints are Ex=0 with Each row of E is a constraint and the columns correspond to the variables. Constraints E1 and E2 are linked because they have variable x5 (Turnover total) in common. E2 and E3 are also linked (through E1).
An optimization approach Change the values of the imputed variables such that: • Edit rules are satisfied • Change is as small as possible Formally, find an adjusted data vector xA such that: xA = arg minD(xA ,x) s.t. ExA≤ 0. ExA≤ 0 means that we consider both equalities and inequalities.
Distance functions Least Squares : (LS) Σi(xi – xiA)2 Weighted Least Squares : (WLS) Σiwi (xi – xiA)2 Kullback-Leibler Divergence: (KL) Σi xi(lnxi – lnxiA)
Adjustments models 1/2 • Least squares(LS): D= Σi(xi – xiA)2 xiA = xi + Σkekiαk Additive adjustments: total adjustment for a variable is a sum of adjustments to each of the constraints . The same adjustment parameter (αk)for all variables in constraint k. • Weighted least squares (WLS): D=Σiwi (xi – xiA)2 xiA = xi + (1/wi)Σkekiαk Additive adjustments but amount of adjustment varies according to the weights.
Adjustments models 2/2 • Kullback-Leibler Divergence (KL): D=Σi xi(lnxi – lnxiA) xiA = xi× Πkexp(ekiαk) Factor can be written as βk if eki =1 and 1/ βk if eki = -1 Multiplicative adjustments, the total adjustment to a variable is the product of adjustments to each constraint. The same multiplicative adjustment parameter βfor all variables in constraint k. It can be shown that for weights 1/xiKL ≈ WLS .
Algorithm Simple iterative procedures exists to estimate the adjustmentsfor general convex distances. Adjust the x-vector to each constraint one by one. This series of single constraint adjustments are easy to perform. After all constraints are visited one iteration is completed. Repeat. • For sum-to-total constraints and KL-divergence equivalent to repeated prorating and Iterative Proportional Fitting • But, more general constraints: differences, linear inequalities, interval constraints. • And more general distances and confidence weights
The generalized ratio approach 1/2 • Methods so far adjust only variables that appear in edit constraints. Aim is only to satisfy “hard” edits. • Inconsistencies between imputed and observed values indicate a difference between the donor record and receptor record. Therefore: adjust all donor values to better fit the receptor record. • For response pattern I, with only Turnover total observed, all donor values were multiplied by the ratio Observed/Donor Turnover. Thus rescaling with a measure of “size”.
The generalized ratio approach 2/2 As a generalisation we propose the following component- wise multiplicative adjustments xiA = xiδi The δi are determined by minimizing their variance subject to the resulting adjusted record satisfying the edit constraints. • Adjustments are as uniform as possible as with ratio- imputation. • But, all kinds of constraints can be satisfied.
Concluding remarks Optimization approach to solving inconsistency problems. • Simultaneous adjustment to all constraints • Generalizes prorating and ratio adjustment for single constraints • Minimum distance approach that aims at consistency with minimum (optimal) adjustments. • Generalized ratio approach, aims to better preserve the structure of the imputed record as in ratio-imputation.