1 / 23

Deliverable 2.6: Selective Editing

Deliverable 2.6: Selective Editing. Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy. Overview. Introduction Related projects Combining data sources Selective editing – data sources and tools Selective editing in SDWH Framework Proposed case studies

keahi
Download Presentation

Deliverable 2.6: Selective Editing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deliverable 2.6: Selective Editing Hannah Finselbach1 and Orietta Luzi2 1ONS, UK 2ISTAT, Italy

  2. Overview • Introduction • Related projects • Combining data sources • Selective editing – data sources and tools • Selective editing in SDWH Framework • Proposed case studies • Deliverable outcomes and recommendations

  3. Introduction • Selective editing options for a Statistical Data Warehouse – including options for weighting the importance of different outputs • UK and Italy • Review or quality assure – Sweden (SELEKT) • Q1: Would you like to review and give comments? (Yes/No)

  4. Statistical Data Warehouse (SDWH) • Benefits: • Decreased cost of data access and analysis • Common data model • Common tools • Drive increased use of administrative data • Faster and more automated data management and dissemination

  5. Statistical Data Warehouse (SDWH) • Drawbacks: • Can have high cost – maintenance and implement changes • Tools may need to be developed for statistical processes • Methodological issues of SDWH framework – covered by WP2 • Phase 1 (SGA-1)  “Work in progress” for most NSIs

  6. Combining data sources • Many NSIs using admin data or registers to produce statistics • Advantages include: • Reduction in data collection and statistical production costs; large amount of data available; re-use data to reduce respondent burden. • Drawbacks include: • Different unit types (statistical and legal); timeliness; variable definition discrepancies. • Mixed source usually required

  7. Editing • UNECE Glossary of terms on Statistical Data Editing: • “an activity that involves assessing and understanding data, and the three phases of detection, resolving, and treating anomalies…” • Large amount of literature on: • Editing business surveys • Editing administrative data

  8. Aims and related projects • This deliverable aims to add value by investigating how to edit (selective editing) when combining sources • Mapping with other projects: • EssNet on Data Integration • EssNet on Administrative Data • MEMOBUST • EDIMBUS Project (2007) • EUREDIT Project (2000-2003) • BLUE-ETS • Q2: Do you know of any other relevant projects? (Yes/No)

  9. Editing combined data sources • SDWH will combine survey, register and admin data sources • Editing required for: • maintaining business register and its quality; • a specific output and its integrated sources; • Improving the statistical system. • Part of quality control in SDWH • Split processes for data sources? (e.g. France)

  10. Combined sources - Questions… • Q3: Do you currently combine data sources? • A. Yes; B. No; C. Unsure. • Q4: Do you have separate editing processes for each data source? • A. Only survey data edited (do not edit admin data); • B. Data sources edited separately; • C. Data sources edited separately, but units/variables in both sources edited for coherence; • D. Other.

  11. Selective editing • Editing – traditionally time consuming and expensive • Selective / significance editing: • Prioritises based on score function that expresses the impact of their potential error on estimates • Score should consist of risk (suspicion) and influence (potential impact) components • Divide anomalies into a critical and a noncritical stream for possible clerical or manual resolution (possibly including follow-up) • More efficient editing process

  12. Selective editing – Survey and Admin data • Use as auxiliary data in selective editing score function for survey data (e.g. UK, Italy) • Use score of differences between data sources to determine which need manual intervention (e.g. France) • Use scores based on historical data • Apply selective editing to admin data, same score function as survey data, but weights=1 (e.g. France SBS system)

  13. Selective editing – question • Q5: Is selective editing used in the processing of admin/register data at your organisation? • A. No; • B. No, but admin data used as auxiliary for selective editing of survey data; • C. No, but a score function is used to compare data sources; • D. Yes, selective editing is applied to admin data; • E. Not sure.

  14. Selective editing – tools • SELEMIX – ISTAT • SELEKT – Statistics Sweden • Significance Editing Engine (SEE) – ABS • SLICE – Statistics Netherlands • Q6: Are you aware of any other selective editing tools? • A. Yes, I can provide documentation; • B. Yes; • C. No.

  15. Selective editing in SDWH • Methodological issues: • Survey weight not meaningful in SDWH • Weight=1? • Several sets of weights tailored for different uses? • Selective editing data “without purpose” • Importance weight for all potential uses? • Alternative editing approach? • Scores to compare data sources • Should score functions be used, or all discrepancies be followed up, or automatically corrected? • Selective editing of admin data – manual intervention? • Is selective editing appropriate if manual intervention is not possible? • Should automatic correction be applied to admin data identified as suspicious?

  16. Any solutions? … • Survey weights used in selective editing score not meaningful • Q7: What do you think would be the best options: • A. Everything in SDHW represents itself and therefore weights=1 • B. Calculate several survey weights for all known uses of unit data item and incorporate into one global score • C. Calculate separate scores for all outputs, and combine (max, average, sum) • D. Other – discuss!

  17. Any solutions? … • Selective editing data “without purpose” • Q8: Is selective editing appropriate if the data will be used multiple times, with unknown purpose at collection? • A. No; • B. No, another editing approach would be better; • C. Yes, we would use key known/likely outputs to calculate the score; • D. Yes, I can suggest/recommend a solution; • E. Not sure;

  18. Any solutions? … • Scores to compare data sources • Q9: Should score functions be used to compare sources, or all discrepancies be followed up, or automatically corrected? • A. All discrepancies need to be investigated by a data expert; • B. All discrepancies need to be flagged, and can then be corrected automatically; • C. Scores should be used to flag only significant/influential discrepancies, which should be investigated by a data expert; • D. Scores should be used to flag only significant/influential discrepancies, which can then be corrected automatically; • E. Other – discuss? • F. Not sure.

  19. Any solutions? … • Selective editing of admin data • Q10: Is selective editing appropriate if manual intervention is not possible? • A. No, only correct for fatal errors, systematic errors (e.g. unit errors), and suspicious reporting patterns; • B. No, identify all errors/suspicious values and automatically correct/impute; • C. Yes, identify only influential errors to avoid over editing/imputing admin source; • D. Yes, as well as fatal errors, systematic errors and suspicious reporting patterns – to also identify influential errors; • E. Other; • F. Not sure.

  20. Experimental studies • ISTAT: Prototype DWH for SBS • Use SELEMIX • Combine statistical and admin data sources at micro level to estimate variables on economic accounts, known domains • Evaluate the quality of model-based selective editing and automatic correction • Re-use available data for other output • ONS: Combined sources for STS • Use SELEKT • Monthly business survey and VAT Turnover data • Compare selective editing or traditional editing of admin data (followed by automatic correction), known domains • Re-use available data for other output

  21. Deliverable outcome - recommendations • Draft report put on CROS-portal – will include input from this workshop • Provide recommendations for methodological issues of using selective editing in SDWH • Using best practice from NSIs, and • Outcome from experimental studies. • Metadata checklist

  22. Metadata requirements • Input to editing: • Quality indictors (e.g. of data source) • Threshold for selective editing score • Potential publication domains • Question number • Predictor/Expected value for score (e.g. historical data, register data) • Domain total and/or standard error estimate for score • Edit identification • … • Output from editing: • Raw and edited value • Selective editing score • Error number/description/type • Flag if suspicious • Flag if changed • …

  23. Thank you!

More Related