140 likes | 160 Views
Imputation in UNECE Statistical Databases: Principles and Practices. Steven Vale and Heinrich Brüngger, UNECE Statistical Division. Contents. The ECOSOC view of statistical imputation Current practices Basic principles Step-by-step implementation Conclusions and open questions.
E N D
Imputation in UNECE Statistical Databases:Principles and Practices Steven Vale and Heinrich Brüngger, UNECE Statistical Division
Contents • The ECOSOC view of statistical imputation • Current practices • Basic principles • Step-by-step implementation • Conclusions and open questions
ECOSOC views • Resolution 2006/6 on strengthening statistical capacity • Sets limits for the use of imputation • ... but also implicitly endorses it as a statistical technique • Statistical agencies need to review their practices to ensure compliance
Defining imputation • “A procedure for entering a value for a specific data item where the response is missing or unusable” • Boundary issues: • Imputing and editing • Imputing and forecasting
Current practice in UNECE • Very limited ad-hoc imputation • Four cases: • Account identities • Regional aggregates • Poor quality national data with little impact on region totals • Re-classification • Using imputations from others • Sufficient transparency in source metadata?
Basic principles (1) • Imputed national data are not published • Avoids the need for consultation • Only official sources used for imputation • Preference for data from same country • Clear distinction between “real” and imputed data • Transparency – imputed data clearly flagged, and methods documented
Basic principles (2) • Aggregates must contain > 90% “real” data, covering > 50% of countries • Imputed data are re-calculated periodically to adjust for revisions • Method used defined at the level of the variable and stored as an attribute • Decisions on the use of imputation to be taken with regard to the quality framework
Step-by-step application • Automatic imputation routines to extend imputation towards the boundaries set by the ECOSOC Resolution • One step at a time, with pause and review to consider quality and cost / benefit • “Dashboard” to allow statisticians to choose the most appropriate method • Implemented in the context of re-engineering of statistical database system
First step • Use a linear trend to impute missing values • Requirements: • Sufficient time series observations (at least 3 out of previous 5 periods) • Closeness of fit of linear trend (R2 close to 1) • Constraints • Validity of R2 for few observations • Forward imputation only
Data Available: Y = Yes N = No Imputation: = Yes = No
Next steps • More flexibility: • Longer time series • Imputing values at start and in middle of time series • Non-linear trends? • Cross-country imputation in strictly limited cases?
Conclusions • Strong links between imputation and quality • Trade-off between accessibility and accuracy • Step-by-step, pause and review approach seems appropriate • Transparency is essential • Standardization of practices between international organizations would help
Open questions • Are other organizations interested in defining a common policy on the use of imputation, in response to the ECOSOC Resolution? • Could we go further and consider harmonization of methods and tools? • How should this be done? Is a specific forum needed, or can this be dealt with in combination with work on data quality? • Have other organizations modified their policies on imputation in the light of the ECOSOC Resolution, and if so, how?