140 likes | 162 Views
Explore the key principles and step-by-step implementation of statistical imputation in UNECE databases. Learn about current practices, basic principles, and the ECOSOC view on imputation. Understand the importance of transparency, quality, and data accuracy in imputation techniques.
E N D
Imputation in UNECE Statistical Databases:Principles and Practices Steven Vale and Heinrich Brüngger, UNECE Statistical Division
Contents • The ECOSOC view of statistical imputation • Current practices • Basic principles • Step-by-step implementation • Conclusions and open questions
ECOSOC views • Resolution 2006/6 on strengthening statistical capacity • Sets limits for the use of imputation • ... but also implicitly endorses it as a statistical technique • Statistical agencies need to review their practices to ensure compliance
Defining imputation • “A procedure for entering a value for a specific data item where the response is missing or unusable” • Boundary issues: • Imputing and editing • Imputing and forecasting
Current practice in UNECE • Very limited ad-hoc imputation • Four cases: • Account identities • Regional aggregates • Poor quality national data with little impact on region totals • Re-classification • Using imputations from others • Sufficient transparency in source metadata?
Basic principles (1) • Imputed national data are not published • Avoids the need for consultation • Only official sources used for imputation • Preference for data from same country • Clear distinction between “real” and imputed data • Transparency – imputed data clearly flagged, and methods documented
Basic principles (2) • Aggregates must contain > 90% “real” data, covering > 50% of countries • Imputed data are re-calculated periodically to adjust for revisions • Method used defined at the level of the variable and stored as an attribute • Decisions on the use of imputation to be taken with regard to the quality framework
Step-by-step application • Automatic imputation routines to extend imputation towards the boundaries set by the ECOSOC Resolution • One step at a time, with pause and review to consider quality and cost / benefit • “Dashboard” to allow statisticians to choose the most appropriate method • Implemented in the context of re-engineering of statistical database system
First step • Use a linear trend to impute missing values • Requirements: • Sufficient time series observations (at least 3 out of previous 5 periods) • Closeness of fit of linear trend (R2 close to 1) • Constraints • Validity of R2 for few observations • Forward imputation only
Data Available: Y = Yes N = No Imputation: = Yes = No
Next steps • More flexibility: • Longer time series • Imputing values at start and in middle of time series • Non-linear trends? • Cross-country imputation in strictly limited cases?
Conclusions • Strong links between imputation and quality • Trade-off between accessibility and accuracy • Step-by-step, pause and review approach seems appropriate • Transparency is essential • Standardization of practices between international organizations would help
Open questions • Are other organizations interested in defining a common policy on the use of imputation, in response to the ECOSOC Resolution? • Could we go further and consider harmonization of methods and tools? • How should this be done? Is a specific forum needed, or can this be dealt with in combination with work on data quality? • Have other organizations modified their policies on imputation in the light of the ECOSOC Resolution, and if so, how?