1 / 13

Imputation in UNECE Statistical Databases: Principles and Practices

Imputation in UNECE Statistical Databases: Principles and Practices. Steven Vale and Heinrich Brüngger, UNECE Statistical Division. Contents. The ECOSOC view of statistical imputation Current practices Basic principles Step-by-step implementation Conclusions and open questions.

Download Presentation

Imputation in UNECE Statistical Databases: Principles and Practices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Imputation in UNECE Statistical Databases:Principles and Practices Steven Vale and Heinrich Brüngger, UNECE Statistical Division

  2. Contents • The ECOSOC view of statistical imputation • Current practices • Basic principles • Step-by-step implementation • Conclusions and open questions

  3. ECOSOC views • Resolution 2006/6 on strengthening statistical capacity • Sets limits for the use of imputation • ... but also implicitly endorses it as a statistical technique • Statistical agencies need to review their practices to ensure compliance

  4. Defining imputation • “A procedure for entering a value for a specific data item where the response is missing or unusable” • Boundary issues: • Imputing and editing • Imputing and forecasting

  5. Current practice in UNECE • Very limited ad-hoc imputation • Four cases: • Account identities • Regional aggregates • Poor quality national data with little impact on region totals • Re-classification • Using imputations from others • Sufficient transparency in source metadata?

  6. Basic principles (1) • Imputed national data are not published • Avoids the need for consultation • Only official sources used for imputation • Preference for data from same country • Clear distinction between “real” and imputed data • Transparency – imputed data clearly flagged, and methods documented

  7. Basic principles (2) • Aggregates must contain > 90% “real” data, covering > 50% of countries • Imputed data are re-calculated periodically to adjust for revisions • Method used defined at the level of the variable and stored as an attribute • Decisions on the use of imputation to be taken with regard to the quality framework

  8. Step-by-step application • Automatic imputation routines to extend imputation towards the boundaries set by the ECOSOC Resolution • One step at a time, with pause and review to consider quality and cost / benefit • “Dashboard” to allow statisticians to choose the most appropriate method • Implemented in the context of re-engineering of statistical database system

  9. First step • Use a linear trend to impute missing values • Requirements: • Sufficient time series observations (at least 3 out of previous 5 periods) • Closeness of fit of linear trend (R2 close to 1) • Constraints • Validity of R2 for few observations • Forward imputation only

  10. Data Available: Y = Yes N = No Imputation: = Yes = No

  11. Next steps • More flexibility: • Longer time series • Imputing values at start and in middle of time series • Non-linear trends? • Cross-country imputation in strictly limited cases?

  12. Conclusions • Strong links between imputation and quality • Trade-off between accessibility and accuracy • Step-by-step, pause and review approach seems appropriate • Transparency is essential • Standardization of practices between international organizations would help

  13. Open questions • Are other organizations interested in defining a common policy on the use of imputation, in response to the ECOSOC Resolution? • Could we go further and consider harmonization of methods and tools? • How should this be done? Is a specific forum needed, or can this be dealt with in combination with work on data quality? • Have other organizations modified their policies on imputation in the light of the ECOSOC Resolution, and if so, how?

More Related