110 likes | 346 Views
Imputation of Economic Data Subject to Linear Restrictions Using a Sequential Regression Approach. Caren Tempelman Statistics Netherlands UNECE 2006, Bonn. Outline. Linear restrictions Problems with imputing data subject to linear restrictions Sequential regression
E N D
Imputation of Economic Data Subject to Linear Restrictions Using a Sequential Regression Approach Caren Tempelman Statistics Netherlands UNECE 2006, Bonn
Outline • Linear restrictions • Problems with imputing data subject to linear restrictions • Sequential regression • Conclusions and future research
Linear restrictions • Economic data need to satisfy different • types of linear restrictions, such as • Balance restrictions • e.g. Profit = turnover - expenses • Inequality restrictions e.g. Non-negativity constraints or the fact that Nr. of employees ≥ Nr. of employees in fte
Imputation of missing data • Standard imputation techniques do not take • linear restrictions on the data into account • and are therefore highly likely to produce • imputations that violate these restrictions. • The need arises for an imputation model • that can incorporate linear restrictions.
Imputation model • We are looking for a model for , • where and . • Difficult to find a joint model • - data consist of several distributional forms • - how to incorporate restrictions • Use conditional distributions instead .
Sequential regression imputation (1) • Inspired by MCMC methods • Use univariate conditional regressions to model each variable separately • Iterate this process so that the final imputed values converge to draws from the multivariate distribution
Sequential regression imputation (2) • The missing items in the variable at • round t+1 are drawn from • which is specified by a regression model. • Continuous bounded variables • Truncated regression model • Semi-continuous bounded variables • Logistic and truncated regression model .
Additional issues • Advantages • Extremely flexible, each variable (type) can be modelled separately • Can easily cope with large datasets • Disadvantages • Possible incompatibility • Balance restrictions cannot be straightforwardly taken into account
Incorporating balance restrictions • If a variable is present in a balance restriction • its value can be derived with certainty from • the other variables. • - Eliminate one missing variable from each • balance restriction • Choose this variable at random to spread the loss • of quality across variables.
Conclusions and future research • Flexible imputation method • Good preliminary results • Simulation study to compare this method to other methods • More research into (in)compatibility and convergence issues